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RECOMBINATION OP POLTNUCLBOTIDB SEQUBNCBS 
USING RANDOM OR DEFINBD PRIMERS 

The U.S. Government has certain rights in this invention pursuant to 
Grant No. DE'FG02-93-CH 10578 awarded by the Department of Energy and 
Grant No. NOOOl 4-96- 1-0340 awarded by the Office of Naval Research. 

5 BACKGROUND OP THE INVENTION 

This application is a continuation-in-part of pending U.S. patent 
applications, Serial numbers 60/041.666, filed March 25, 1997; 60/045,211. 
filed April 30, 1997; 60/046»256, filed May 12, 1997; and 08/905, 359, filed 
August 4. 1997. 

10 

1. Field of the Invention 

Itie present invention relates generally to in vitro methods for 
mutagenesis and recombination of polynucleotide sequences. More 
particularly, the present invention involves a simple and efficient method for in 
15 vitro mutagenesis and recombination of polynucleotide sequences based on 

polymerase-catalyzed extension of primer oligonucleotides, followed by gene 
assembly and optional gene amplification. 

2. Description of Related Art 

20 The publications and other reference materials referred to herein to 

describe the background of the invention and to provide additional detail 
regarding its practice are hereby incorporated by reference. For convenience, 
the reference materials are numerically referenced and grouped in the 
appended bibliography. 

25 Proteins are engineered with the goal of improving their performance 

for practical applications. Desirable properties depend on the application of 
interest and may include tighter binding to a receptor, high catalytic activity, 
high stability, the ability to accept a wider (or narrower) range of substrates, or 
the ability to function in nonxiatural environments such as organic solvents. A 

30 variety of approaches, including 'rational' design and random mutagenesis 

methods, have been successfully used to optimize protein functions (1). The 
choice of approach for a given optimization problem will depend upon the 
degree of understanding of the relationships between sequence, structure and 
function. The rational redesign of an enzyme catalytic site, for example, often 
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requires extensive knowledge of the enzyme structure, the structures of its 
complexes with various ligands and analogs of reaction intermediates and 
details of the catalytic mechanism. Such information is available only for a 
very few well-studied systems; little is known about the vast majority of 
5 potentially interesting en^mes. Identifying the amino acids responsible for 

existing protein functions and those which might give rise to new functions 
remains an oflcn-overwhelming challenge. This, together with the growing 
appreciation that many protein functions are not confined to a small number 
of amino acids, but are affected by residues far from active sites, has 
10 prompted a growing number of groups to turn to random mutagenesis, or 

'directed' evolution, to engineer novel proteins (I). 

Various optimization procedures such as genetic algorithms (2,3) and 
evolutionary strategics (4,5) have been inspired by natural evolution. These 
procedures employ mutation, which makes small random changes in 
15 members of the population, as well as crossover, which combines properties of 

different individuals, to achieve a spedfic optimization goal. There also exist 
strong interplays between mutation and crossover, as shown by computer 
simulations of different optimization problems (6-9). Developing efficient and 
practical experimental techniques to mimic these key processes is a scientific 
20 challenge. The application of such techniques should allow one, for example, 

to explore and optimize the functions of biological molecules such as proteins 
and nucleic acids, in vivo or even completely free from the constraints of a 
living system (10,11). 

Directed evolution, inspired by natural evolution, involves the 
25 generation and selection or screening of a pool of mutated molecules which 

has sufficient diversity for a molecule encoding a protein with altered or 
enhanced function to be present therein. It generally begins with creation of a 
library of mutated genes. Gene products which show improvement with 
respect to the desired property or set of properties are identified by selection 
30 or screening. The gene(s) encoding those products can be subjected to further 

cycles of the process in order to accumulate beneficial mutations. This 
evolution can involve few or many generations, depending on how far one 
wishes to progress and the effects of mutations typically observed in each 
generation. Such approaches have been used to create novel functional 
35 nucleic acids (12), peptides and other small molecules (12), antibodies (12), as 

well as enzymes and other proteins (13,14,16). Directed evolution requires 
litde specific knowledge about the product itself, only a means to evaluate the 
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function to be optimized. These procedures are even fairly tolerant to 
inaccuracies and noise in the function evaluation (15). 

The diversity of genes for directed evolution can be created by 
introducing new point mutations using a variety of methods, including 
5 mutagenic PCR (15) or combinatorial cassette mutagenesis (16). The abili^ to 

recombine genes, however, can add an important dimension to the 
evolutionary process, as evidenced by its key role in natural evolution. 
Homologous recombination is an important natural process in which 
organisms exchange genetic information between related genes, increasing the 

10 accessible genetic diversity within a species. While introducing potentially 

powerful adaptive and diversification competencies into their hosts, such 
pathways also operate at very low ciiiciencies, often eliciting insignificant 
changes in pathway structure or function, even after tens of generations. 
Thus, while such mechanisms prove beneficial to host organisms/ species over 

15 geological time spans, in vivo recombination methods represent cumbersome j 

if not unusable, combinatorial processes for tailoring the performance of 
enzymes or other proteins not strongly linked to the organism's intermediary 
metabolism and survival. 

Several groups have recognized the utility of gene recombination in 

20 directed evolution. Methods for in vivo recombination of genes are disclosed, 

for example, in published PCT application WO 97/07205 and US Pat. No. 
5,093,257. As discussed above, these in vivo methods are cumbersome and 
poorly optimized for rapid evolution of function. Stemmer has disclosed a 
method for in vitro recombination of related DNA sequences in which the 

25 parental sequences are cut into fragments, generally using an enzyme such as 

DNase I, and are reassembled (17,18,19). The non-random DNA 
fragmentation associated with DNase I and other endonucleases, however, 
introduces bias into the recombination and limits the recombination diversity. 
Furthermore, this method is limited to recombination of double-stranded 

30 polynucleotides and cannot be used on single-stranded templates. Further, 

this method does not work well with certain combinations of genes and 
primers. It is not efficient for recombination of short sequences (less than 200 
nucleotides (nts)), for example. Finally, it is quite laborious, requiring several 
steps. Alternative, convenient methods for creating novel genes by point 

35 mutagenesis and recombination in vitro are needed. 
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NUMMARY OF THE INVENTION 
The present invention provides a new and significantly improved 
approach to creating novel polynucleotide sequences by point mutation and 
recombination in vitro of a set of parental sequences (the templates). The 
novel polynucleotide sequences can be useful in themselves (for example, for 
DNA-based computing), or they can be expressed in recombinant organisms 
for directed evolution of the gene products. One embodiment of the invention 
involves priming the template gcne(s) with random-sequence oUgonucleotides 
to generate a pool of short DNA fr^ents. Under appropriate reaction 
conditions, these short DNA fragments can prime one another based on 
complementarity and thus can be reassembled to form fuU-length genes by 
repeated thermocycling in the presence of thermostable DNA polymerase. 
These reassembled genes, which contain point mutations as well as novel 
combinations of sequences from different parental genes, can be further 
amplified by conventional PGR and cloned into a proper vector for expression 
of the encoded proteins. Screening or selection of the gene products leads to 
new variants with improved or even novel functions. These variants can be 
used as they are, or they can serve as new starting points for further qycles of 
mutagenesis and recombination. 

A second embodiment of the invention involves priming the template 
gene(s) with a set of primer oligonucleotides of defined sequence or defmed 
sequence exhibiting limited randomness to generate a pool of short DNA 
fragments, which are then reassembled as described above into fuU length 
genes. 

A third embodiment of the invention involves a novel process we term 
the 'staggered extension* prt)cess, or StEP. Instead of reassembling the pool of 
fragments created by the extended primers, full-length genes are assembled 
directly in the presence of the template{s). The StEP consists of repeated 
cycles of denaturation followed by extremely abbreviated annealing/ extension 
steps. In each cycle the extended fragments can anneal to different templates 
based on complementarity and extend a little further to create "recombinant 
cassettes." Due to this template switching, most of the polynucleotides 
contain sequences from different parental genes (i.e. are novel recombinants). 
This process is repeated until fuU-length genes form. It can be followed by an 
optional gene amplification step. 

The different embodiments of the invention provide features and 
advantages for different appUcations. In the most preferred embodiment, one 
or more defined primers or defined primers exhibiting limited randomness 
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which correspond to or flank the 5* and 3* ends of the template 
poljniucleotidcs are used with StEP to generate gene fragments which grow 
into the novel full-length sequences. This simple method requires no 
knowledge of the template sequence(s) . 

In another preferred embodiment, multiple defined primers or defined 
primers exhibiting limited randomness are used to generate short gene 
fragments which are reassembled into full-length genes. Using multiple 
defined primers allows the user to bias in vitro recombination frequency. If 
sequence information is available, primers can be designed to generate 
overlapping recombination cassettes which increase the frequency of 
recombination at particular locations. Among other features, this method 
introduces the flexibility to take advantage of available structural and 
functional information as well as information accumulated through previous 
generations of mutagenesis and selection (or screening). 

In addition to recombination, the different embodiments of the primer- 
based recombination process will generate point mutations. It is desirable to 
know and be able to control this point mutation rate, which can be done by 
manipulating the conditions of DNA synthesis and gene reassembly. Using 
the defined-primer approach, specific point mutations can also be directed to 
speciflc positions in the sequence through the use of mutagenic primers. 

The various primer-based recombination methods in accordance with 
this invention have been shown to enhance the activity of Actinoplanes 
utahensis ECB deacylase over a broad range of pH values and in the presence 
of organic solvent and to improve the thermostability of Bacillus subtilis 
subtilisin E. DNA sequencing confirms the role of point mutation and 
recombination in the generation of novel sequences. These protocols have 
been found to be both simple and reliable. 

The above discussed and many other features and attendant 
advantages will become better understood by reference to the following 
detailed description when taken in conjunction with the accompanying 
drawings. 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIG. 1 depicts recombination in accordance with the present invention 
using random-sequence primers and gene reassembly. The steps shown are: 
a) Synthesis of single-stranded DNA fragments using mesophilic or 
thermophilic polymerase with random- sequence oligonucleotides as primers 
(primers not shown); b) Removal of templates; c) Reassembly with 
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thermophilic DNA polymerase; d) Amplification with thermostable 
polymerase(s); e) Cloning and Screening (optional); and i) Repeat the process 
with selected gene(s) (optional). 

5 FIG, 2 depicts recombination in accordance with the present invention 

using defined primers. The method is iUustrated for the recombination of two 
genes, where x - mutation. The steps diagrammed are: a) The genes are 
primed with defined primers in PCR reactions that can be done separately (2 
primers per reaction) or combined (multiple primers per reaction); c) Initial 

10 products are formed until defined primers are exhausted. Template is 

removed (optional) ; d) Initial fragments prime and extend themselves in 
further cycles of PCR with no addition of external primers. Assembly 
continues until fuU-length genes are formed; c) Loptionall PuU-lcngth genes are 
amplified in a PCR reaction with external primers; f) (optional) Repeat the 

1 5 process with selected gene(s). 

FIG. 3 depicts recombination in accordance with the present invention 
using two defined flanking primers and StEP. Only one primer and two single 
strands from two templates are shown here to iUustrate the recombination 

20 process. The outiined steps are: a) After denaturation, template genes are 

primed with one defined primer; b) Short fragments are produced by primer 
extension for a short time; c) In tiie next cycle of StEP. fragments are 
randomly primed to the templates and extended further; d) Denaturation and 
annealing/ extension is repeated until fulHengtii genes are made (visible on an 

25 agarose gel); e) Full-length genes are purified, or ampUfied in a PCR reaction 

with external primers (optional); f) (optional) Repeat the process witii selected 
gene(s). 

FIG. 4 is a diagrammatic representation of the results of the 
30 recombination of two genes using two flanking primers and staggered 

extension in accordance with the present invention. DNA sequences of five 
genes chosen from the recombined library are indicated, where x is a mutation 
present in the parental genes, and the triangle represents a new point 
mutation. 

35 

FIG. 5 is a diagrammatic representation of the sequences of the pNB 
esterase genes described in Example 3. Template genes 2-13 and 5-B12 were 
recombined using the defined primer approach. The positions of the primers 
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are indicated by arrows, and the positions where the parental sequences differ 
from one another are indicated by x's. New point mutations are indicated by 
triangles. Mutations identified in these recombincd genes are listed (only 
positions which differ in the parental sequences are listed). Both 6E6 and 
6H1 are recombination products of the template genes. 

FIG. 6 shows the positions and sequences of the four defined intemad 
primers used to generate recombined genes from template genes Rl and R2 
by interspersed primer-based recombination. Primer P50P contains a 
mutation (A->T at base position 598) which simultaneously eliminates a 
Hindlll restriction site and adds a new unique Nhel site. Gene R2 also 
contains a mutation A->G at the same base position, which eliminates the 
Hindin site. 

FIG. 7 is an electrophoresis gel which shows the results of the 
restriction-digestion analysis of plasmids from the 40 clones. 

FIG. 8 shows the results of sequencing ten genes from the defined 
primer-based recombination librajy. Lines represent 986-bp of subtilisin £ 
gene including 45 nt of its prosequence, the entire mature sequence and 113 
nt after the stop codon. Crosses indicate positions of mutations from parent 
gene Rl and R2, while triangles indicate positions of new point mutations 
introduced during the recombination procedure. Circles represent the 
mutation introduced by the mutagenic primer P50F. 

FIG. 9 depicts the results of appl3ring the random-sequence primer 
recombination method to the gene for Actinoplanes utahensis ECB deacylase. 
(a) The 2.4 kb ECB deacylase gene was purified from an agarose gel. (b) The 
size of the random priming products ranged from 100 to 500 bases, (c) 
Fragments shorter than 300 bases were isolated, (d) The purified fragments 
were used to reassemble the full-length gene with a smear background, (e) A 
single PCR product of the same size as the ECB deacylase gene was obtained 
after conventional PCR with the two primers located at the start and stop 
regions of this gene, (i) After digestion with Xho I and Psh AI. the PCR product 
was cloned into a modified pIJ702 vector to form a mutant library, (g) 
Introducing this library into Streptomyces liuidans TK23 resulted in 
approximately 71% clones producing the active ECB deacylase. 
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FIG. 10 shows the specific activity of the wild-type ECB deacylase and 
mutant M16 obtained in accordance with the present invention. 

FIG. 1 1 shows pH profiles of activity of the wild-type ECB deacylase 
and mutant M16 obtained in accordance with the present invention. 

FIG. 12 shows the DNA sequence analysis of 10 clones randomly chosen 
from the libraiy/Klcnow. lines represent 986-bp of subtilisin E gene including 
45 nt of its prosequence, the entire mature sequence and 113 nt after the stop 
codon. Crosses indicate positions of mutations from Rl and R2, while 
triangles indicate positions of new point mutations introduced during the 
random-priming recombination process. 

FIG. 13 Thermostability index profiles of the screened clones from the 
five libraries produced using different polymerases: a) library/ Klenow, b) 
Ubrary/T4, c) library/Sequenase, d) library/ Stoffel and e) library/Pfu. 
Normalized residual activity (Ar/Ai) after incubation at 65«C was used as an 
index of the enayme thermostability. Data were sorted and plotted in 
descending order. 

nRTAlLED DESCRIPTION OF THE INVENTION 
In one preferred embodiment of the present invention, a set of primers 
with all possible nucleotide sequence combinations (dp(N)L where L - primer 
length) is used for the primer-based recombination. It has been known for 
years that oligodeoxynudeotidcs of different lengths can serve as primers for 
initiation of DNA synthesis on singie-stranded templates by the Klenow 
fragment of E.coli polymerase I (21). Although they are smaller than the size 
of a normal PGR primer (i.e. less than 13 bases), oligomers as short as 
hexanudeotides can adequately prime the reaction and are frequently used in 
labeling reactions (22). The use of random primers to create a pool of gene 
fragments followed by gene reassembly in accordance with the invention is 
shown in PIG. 1. The steps include generation of diverse "breeding blocks" 
from the single-stranded polynucleotide templates through random priming, 
reassembly of the full-length DNA from the generated short, nascent DNA 
fragments by thermocycling in the presence of DNA polymerase and 
nucleotides, and amplification of the desired genes from the reassembled 
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products by conventional PGR for further cloning and screening. This 
procedure introduces new mutations mainly at the priming step but also 
during other steps. These new mutations and the mutations already present 
in the template sequences are recombined during reassembly to create a 
library of novel DNA sequences. The process can be repeated on the selected 
sequences, if desired. 

To carry out the random priming procedure, the template(s) can be 
single- or denatured double-stranded polynucleotide(s) in linear or closed 
circular form. The templates can be mixed in equimolar amounts, or in 
amounts weighted, for example, by their functional attributes. Since, at least 
in some cases, the template genes are cloned in vectors into which no 
additional mutations should be introduced, they are usually first cleaved with 
restriction endonuciease(s) and purified from the vectors. The resulting linear 
DNA molecules are denatured by boiling, annealed to random-sequence 
oligodeoxynucleotides and incubated with DNA polymerase in the presence of 
an appropriate amount of dNTPs. Hexanucleotide primers are preferred, 
although longer random primers (up to 24 bases) may also be used, depending 
on the DNA polymerase and conditioning used during random priming 
synthesis. Thus the oligonucleotides prime the DNA of interest at various 
positions along the entire target region and are extended to generate short 
DNA fragments complementary to each strand of the template DNA. Due to 
events such as base mis-incorporations and inispriming, these short DNA 
fragments also contain point mutations. Under routinely established reaction 
conditions, the short DNA fragments can prime one another based on 
homology and be reassembled into full-length genes by repeated thermo- 
cycling in the presence of thermostable DNA polymerase. The resulting full- 
length genes will have diverse sequences, most of which, however, still 
resemble that of the original template DNA. These sequences can be further 
amplified by a conventional PGR and cloned into a vector for expression. 
Screening or selection of the expressed mutants should lead to variants with 
improved or even new specific functions. These variants can be immediately 
used as partial solutions to a practical problem, or they can serve as new 
starting points for further cycles of directed evolution. 

Compared to other techniques used for protein optimization, such as 
combinatorial cassette and oligonucleotide-directed mutagenesis (24,25,26), 
error-prone PGR (27, 28), or DNA shuffling (17,18,19), some of the advantages 
of the random-primer based procedure for in vitro protein evolution are 
summarized as follows: 
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1. The template(s) used for random priming synthesis may be 
either single- or double- stranded polynucleotides. In contrast, error-prone 
PGR and the DNA shuffling method for recombination (17,18,19) necessarily 
employ only double-stranded polynucleotides. Using the technique described 

5 here, mutations and/or crossovers can be introduced at the DNA level by 

using different DNA-dependent DNA polymerases, or even directly from mRNA 
by using different RNA-dependent DNA polymerases. Recombination can be 
performed using single-stranded DNA templates. 

2. In contrast to the DNA shuffling procedure, which requires 
10 fragmentation of the double- stranded DNA template (generally done with 

DNAse I) to generate random fragments, the technique described here employs 
random priming synthesis to obtain DNA fragments of controllable size as 
"breeding blocks" for further reassembly (FIG. I). One immediate advantage is 
that two sources of nuclease activity (DNase I and 5'-3' exonuclease) are 
15 eliminated, and this allows easier control over the size of the final reassembly 

and amplification gene fragments. 

3. Since the random primers are a population of synthetic oligo- 
nucleotides that contain all four bases in every position, they are uniform in 
their length and lack a sequence bias. The sequence heterogeneity allows 

20 them to form hybrids with the template DNA strands at many positions, so 

that every nucleotide of the template (except, perhaps, those at the extreme 5' 
terminus) should be copied at a similar frequency into products. In this way, 
both mutations and crossover may happen more randomly than, for example, 
with error- prone PGR or DNA shuffling. 

25 4. The random-primed DNA synthesis is based on the hybridization 

of a mixture of hexanudeotides to the DNA templates, and the complementary 
strands arc synthesized from the 3' -OH termini at the rsuidom hexanucleotide 
primer using polymerase and the four deoxynuclcotide triphosphates. Thus 
the reaction is independent of the length of the DNA template. DNA fragments 

30 of 200 bases length can be primed equally well as linearized plasmid or X DNA 

(29). This is particularly useful for engineering peptides, for example. 

5. Since DNase I is an endonuclease that hydrolyzes double- 
stranded DNA preferentially at sites adjacent to pyrimidine nucleotides, its use 
in DNA shuffling may result in bias (particularly for genes with high G+C or 

35 high A+T content) at the step of template gene digestion. Effects of this 

potential bias on the overall mutation rate and recombination frequency may 
be avoided by using the random-priming approach. Bias in random priming 
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due to preferential hybridization to GC-rich regions of the template DNA could 
be overcome by increasing the A and T content in the random oligonucleotide 
libraiy. 

An important part of practicing the present invention is controlling the 
s average size of the nascent, single-strand DNA ^thesized during the random 

priming process. This step has been studied in detail by others. Hodgson and 
Fisk (30) found that the average size of the sjmthesized single-strand DNA is 
an inverse function of primer concentration: length » k/ VlnPc , where Pc is 
the primer concentration. The inverse relationship between primer concen- 
10 tration and output DNA fragment size may be due to steric hindrance. Based 

on this guideline, proper conditions for random- priming synthesis can be 
readily set for individual genes of different lengths. 

Since dozens of polymerases are currently available, synthesis of the 
short, nascent DNA fragments can be achieved in a variety of fashions. For 
15 example, bacteriophage T4 DNA polymerase (23) or T7 sequenase version 2.0 

DNA polymerase (31,32) can be used for the random priming synthesis. 

For single-stranded polynucleotide templates (particularly for RNA 
templates), a reverse transcriptase is preferred for random-priming synthesis. 
Since this enzyme lacks 3'->5' exonuclease activity, it is rather prone to error. 

20 In the presence of high concentrations of dNTPs and Mn^^, about 1 base in 

every 500 is misincorporated (29). 

By modifying the reaction conditions, the PGR can be acUusted for the 
random priming synthesis using thermostable polymerase for the short, 
nascent DNA fragments. An important consideration is to identify by routine 

25 experimentation the reaction conditions which ensure that the short random 

primers can anneal to the templates and give sufficient DNA amplification at 
higher temperatures. We have found that random primers as short as dp(N)i2 
can be used with PGR to generate the extended primers. Adapting the PGR to 
the random priming synthesis provides a convenient method to make short, 

30 nascent DNA fragments and makes this random priming recombination 

technique very robust. 

In many evolution scenarios, recombination should be conducted 
between oligonucleotide sequences for which sequence information is available 
for at least some of the template sequences. In such scenarios, it is often 

35 possible to define and synthesize a series of primers which are interspersed 

between the various mutations. When defined primers are used, they can be 
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between 6 and 100 bases long. In accordance with the present invention, it 
was discovered that by allowing these defined primers to initiate a series of 
overlapping primer extension reactions (which may be facilitated by thermo- 
cycling), it is possible to generate recombination cassettes each containing one 
or more of the accumulated mutations, alleltc or isotypic differences between 
templates. Using the defined primers in such a way that overlapping 
extension products are generated in the DNA polymerization reactions, 
exhaustion of available primer leads to the progressive cross-hybridization of 
primer extended products until complete gene products are generated. The 
repeated rounds of annealing, extension and denaturation assure recombi- 
nation of each overlapping cassette with every other. 

A preferred embodiment of the present invention involves methods in 
which a set of defined oligonucleotide primers is used to prime DNA ^thesis. 
FIG. 2 illustrates an exemplary version of the present invention in which 
defined primers are used. Careful design and positioning of oligonucleotide 
primers facilitates the generation of non-random extended recombination 
primers and is used to determine the msgor recombination (co-segregation) 
events along the length of homologous templates. 

Another embodiment of the present invention is an alternative 
approach to primer-based gene assembly and recombination in the presence 
of template. Thus, as illustrated in FIG, 3, the present invention includes 
recombination in which enzyme-catalyzed DNA polymerization is allowed to 
proceed only briefly (by limiting the time and lowering the temperature of the 
extension step) prior to denaturation. Denaturation is followed by random 
annealing of the extended fragments to template sequences and continued 
partial extension. This process is repeated multiple times, depending on the 
concentration of primer and template, until full length sequences are made. 
This process is called staggered extension, or StEP. Although random primers 
can also be used for StEP, gene synthesis is not nearly as efficient as with 
defined primers. Thus defined primers are preferred. 

In this method, a brief annealing/ extension step(s) is used to generate 
the partially extended primer. A typical annealing/extension step is done 
under conditions which allow high fidelity primer annealing (Tannealing greater 
than Tm'^^), but limit the polymerization/extension to no more than a few 
seconds (or an average extension to less than 300 nts). Minimum extensions 
are preferably on the order of 20-50 nts. It has been demonstrated that 
thermostable DNA polymerases typically exhibit maximal polymerization rates 
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of 100-150 nucleotides/ second/enzyme molecule at optimal temperatures, 
but follow approximate Arrhenius kinetics at temperatures approaching the 
optimum temperature (Topt). Thus, at a temperature of 55°C, a thermostable 
polymerase exhibits only 20-25% of the steady state polymerization rate that it 
5 exhibits at 72'C (Topt), or 24 nts/second (40), At 37°C and 22°C, Taq 

polymerase is reported to have extension activities of 1.5 and 0.25 
nts/second, respectively (24). Both time and temperature can be routinely 
altered based on the desired recombination events and knowledge of basic 
polymerase kinetics and biochemistry. 

The progress of the staggered extension process is monitored by 
removing aliquots from the reaction tube at various time points in the primer 
extension and separating DNA fragments by agarose gel electrophoresis. 
Evidence of effective primer extension is seen from the appearance of a low 
molecular weight 'smear* early in the process which increases in molecular 

15 weight with increasing cycle number. 

Unlike the gene amplificatibn process (which generates new DNA 
exponentially), StEP generates new DNA fragments in an additive manner in 
its early cycles which contain DNA segments corresponding to the different 
template genes. Under non-ampli^ng conditions, 20 pycles of StEP generates 

20 a maximal molar yield of DNA of approximately 40 times the initial template 

. concentration. In comparison, the idealized polymerase chain reaction 
process for gene ampUilcation is multiplicative throughout, giving a maximal 
molar yield of approximately 1 x lO^-fold through the same number of steps. 
In practice, the difference between the two processes can be observed by PGR, 

25 giving a dear 'band' after only a few (less than 10) cycles when starting with 

template at concentrations of less than 1 ng/ul and primers at lO-500-fold 
excess (vs. 10^-fold excess typical of gene amplification). Under similar 
reaction conditions, the StEP would be expected to give a less visible 'smear*, 
which increases in molecular weight with increasing number of cycles. When 

30 significant numbers of primer extended DNA molecules begin to reach sizes of 

greater than 1/2 the length of the full length gene, a rapid jump in molecular 
weight occurs, as half-extended forward and reverse strands begin to cross- 
hybridize to generate fragments nearly 2 dmes the size of those encountered 
to that point an the process. At this point, consolidation of the smear into a 

35 discrete band of the appropriate molecular weight can occur rapidly by either 
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continuing to subject the DNA to St£P» or altering the thermocycle to allow 
complete extension of the primed DNA to drive exponential gene amplification. 

Following gene assembly (and, if necessary, conversion to double 
stranded form) recombined genes are amplified (optional), digested with 
5 suitable restriction enzymes and ligated into expression vectors for screening 

of the expressed gene products. The process can be repeated if desired, in 
order to accumulate sequence changes leading to the evolution of desired 
functions. 

The staggered extension and homologous gene assembly process (StEP) 
10 represents a powerful, flexible method for recorabining similar genes in a 

random or biased fashion. The process can be used to concentrate 
recombination within or away from specific regions of a known series of 
sequences by controlling placement of primers and the time allowed for 
annealing/extension steps. It can also be used to recombine specific cassettes 
15 of homologous genetic information generated separately or within a single 

reaction. The method is also applicable to recombining genes for which no 
sequence information is available but for which functional 5' and 3' 
amplification primers can be prepared. Unlike other recombination methods, 
the staggered extension process can be run in a single tube using 
20 conventional procedures without complex separation or purification steps. 

Some of the advantages of the defined-primer embodiments of the 
present invention are summarized as follows: 

1. The StEP method does not require separation of parent 
molecules from assembled products. 
25 2. Defmed primers can be used to bias the location of 

recombination events. 

3. StEP allows the recombination frequency to be adjusted by 
varying extension times. 

4. The recombination process can be carried out in a single tube. 
30 5. The process can be carried out on single-stranded or double- 
stranded polynucleotides. 

6. The process avoids the bias introduced by DNase I or other 
endonucleases. 

7. Universal primers can be used. 

35 8. Defined primers exhibiting limited randomness can be used to 

increase the frequency of mutation at selected areas of the gene. 
As will be appreciated by those skilled in the art, several embodiments 
of the present invention are possible. Exemplary embodiments include: 
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1. Recombination and point mutation of related genes using only 
defined flanking primers and staggered extension. 

2. Recombination and mutation of related genes using flanking 
primers and a series of internal primers at low enough 
concentration that exhaustion of the primers will occur over the 
course of the thermocycling, forcing the overlapping gene 
fragments to cross-hybridize and extend until recombined 
synthetic genes are formed. 

3. Recombination and mutation of genes using random-sequence 
primers at high concentration to generate a pool of short DNA 
fragments which are reassembled to form new genes. 

4. Recombination and mutation of genes using a set of defined 
primers to generate a pool of DNA fragments which are 
reassembled to form new genes. 

5. Recombination and mutation of single-stranded polynucleotides 
using one or more defined primers and staggered extension to 
form new genes. 

6. Recombination using defined primers with limited randomness 
at more than 30% or more than 60% of the nucleotide positions 
within the primer. 

Examples of practice showing use of the primer-based recombination 
method are as follows. 

EXAMPLE 1 

Use of defined flanking primers and staggered extension to 
recombine and enhance the thermostability of subtilisin E 

This example shows how the defined primer recombination method can 
be used to enhance the thermostability of subtilisin E by recombination of two 
genes known to encode subtilisin E variants with thermostabilities exceeding 
that of wild-type subtilisin E. This example demonstrates the general method 
outiined in FIG. 3 utilizing only two primers corresponding to the 5' and 3' 
ends of the templates. 

As ouUined in FIG. 3, extended recombination primers are first 
generated by the staggered extension process (StEP), which consists of 
repeated cycles of denaturation followed by extremely abbreviated 
annealing/extension step(s). The extended fi-agments are reassembled into 
full-length genes by thermocycling-assisted homologous gene assembly in the 
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presence of a DNA polymerase, followed by an optional gene amplification 
step. 

Two thermostable subtilisin E mutants Rl and R2 were used to test the 
defined primer based recombination technique using staggered extension. 
5 The positions at which these two genes differ from one another are shown in 

Table 1. Among the ten nucleotide positions that differ in Rl and R2, only 
those mutations leading to amino acid substitutions Asn 181 -Asp (N181D) and 
Asn 218-Ser (N218S) confer thermostabiUty. The remaining mutations are 
neutral with respect to their effects on thermostabiliQr P3). The half-Uves at 
LO 65'C of the single variants N181D and N218S are approximately 3-fold and 2- 

fold greater than that of wild type subtilisin E, respectively, and their melting 
temperatures. Tm. are 3.7'C and 3.2'C higher than tiiat of wild type enzyme, 
respectively. Random recombination events that yield sequences containing 
both these functional mutations will give rise to enzymes whose half Uves at 
15 65«C are approximately 8-fold greater than tiiat of wUd type subtiUsin E, 

provided no new deleterious mutations are introduced into these genes during 
the recombination process. Furthermore, the overall point mutagenesis rate 
associated with the recombination process can be estimated from the catalytic 
activity profile of a small sampling of the recombined variant library. If the 
20 point mutagenesis rate is zero, 25% of the population should exhibit wild type- 

like activity, 25% of the population should have double mutant 
(N181D+N218S)-like activity and the remaining 50% should have single 
mutant (N181D or N218S)-like activity. Finite point mutagenesis increases 
the fraction of the library that encodes en^es with wild-type like (or lower) 
25 activity. This fraction can be used to estimate the point muUgenesis rate. 
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TABLE 1 

DNA and amino acid substltatiotia in thermostable 
subtillsin £ mutants Rl and R2. 







Base 


Position 




Amino acid 


Gene 


Base 


Substitution 


in codon 


Amino add 


substitudon 




780 


A->G 


2 


109 


Asn->Ser 


Rl 


1107 


A->G 


2 


218 


Asn-^Ser 




1141 




3 


229 


synonymous 




1153 


A->G 


3 


233 


synonymous 




484 


A^G 


3 


10 


synonymous 




520 


A->T 


3 


22 


synonymous 




598 


A->G 


3 


48 


synonymous 




731 


G-^ A 


1 


93 


Val-^ne 


R2 


745 


T->C 


3 


97 


synonymous 




780 


A->G 


2 


109 


Asn->Ser 




995 


A-^G 


1 


181 


Asn-»Asp 




1189 


A->G 


3 


245 


synonymous 



Mutations listed are relative to wild type subtilisin E with base substitution at 
780 in common. 



Materials and Methods 

Procedure for defined primer based recombination using two flanking primers. 

Two defined primers, P5N (5'-CCGAG CGTTG CATAT GT GGA AG-3' 
(SEQ. ID. NO: 1), underlined sequence is Ndel restriction site) and P3B (5 - 
CGACT CTAGA GGATC CG ATT C-3' (SEQ. ID. NO: 2), underlined sequence is 
BamHI restriction site), corresponding to 5' and 3' flanking primers, 
respectively, were used for recombination. Conditions (100 ul final volume): 
0.15 pmol piasmid DNA containing genes Rl and R2 (mixed at 1:1) were used 
as template, 15 pmol of each flanking primer, 1 times Tag buffer, 0.2 mM of 
each dNTP, 1.5 mM MgCh and 0.25 U Tag polymerase. Program: 5' minutes of 
95*C, 80 cycles of 30 seconds 94"C, 5 seconds 55*0. The product of correct 
size (approximately Ikb) was cut from an 0.8% agarose gel after 
electrophoresis and purified using QIAEX II gel extraction kit. This purified 
product was digested with Ndel and BamHI and subdoned into pBE3 shutde 
vector. This gene library was amplified in B, coli HBlOl and transferred into 
B. subtilis DB428 competent cells for e2q)ression and screening, as described 
elsewhere (35). 

DNA sequencing 

Genes were purified using QIAprep spin piasmid miniprep kit to obtain 
sequencing quality DNA. Sequencing was done on an ABl 373 DNA 
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Sequencing System using the Dye Terminator Cycle Sequencing kit (Perkin- 
Elmer, Branchburg, NJ). 

Results 

The progress of the staggered extension was monitored by removing 
aiiquots (10 ul) from the reaction tube at various time points in the primer 
extension process and separating DNA fragments by agarose gel 
electrophoresis. Gel electrophoresis of primer extension reactions revealed 
that annealing/extension reactions of 5 seconds at 55'C resulted in the 
occurrence of a smear approaching 100 bp (after 20 cycles). 400 bp (after 40 
cycles), 800 bp (after 60 cycles) and finally a strong approximately 1 kb band 
within this smear. This band (mixture of reassembled products) was gel 
purified, digested with restriction enzyme BamHl and Ndel, and tigated with 
vector generated by Bamm-NdA digestion of the coU / B. subtiUs pBE3 
shuttle vector. This gene library was amplified in B, coU HBlOl and 
transferred into B. subtiUs DB428 competent cells for expression and 
screening [35). 

The thermostability of enzyme variants was determined in the 96-weU 
plate format described previously (33). About 200 clones were screened, and 
approximately 25% retained subtiUsin activity. Among these active clones, the 
frequency of the double mutant-like phenotype (high thermostability) was 
approximately 23%, the single mutant-like phenotype was approximate^ 42%, 
and wild type-like phenotype was approximately 34%. This distribution is 
very close to the values expected when the two thermostable mutations N218S 
and N181D can recombine with each other completely freely. 

Twenty clones were randomly picked from B. coU HBlOl gene Ubraiy. 
Their plasmid DNAs were isolated and digested with Ndel and BamHI. Nine 
out of 20 (45%) had the inserts of correct size (approximately 1 kb). Thus, 
approximately 55% of the above Ubraiy had no activity due to lack of tiie 
correct subtilisin B gene. These clones are not members of the subtilisin 
library and should be removed from our calculations. Taking into account 
this factor, we find that 55% of the library (25% active clones/45% clones with 
correct size insert) retained subtilisin activity. This activity profile indicates a 
point mutagenesis rate of less than 2 mutations per gene (36). Five clones witii 
inserts of the correct size were sequenced. The results are summarized in FIG. 
4. All five genes are recombination products with minimum crossovers 
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vaiying from 1 to 4. Only one new point mutation was found in these five 
genes. 

EXAMPLE 2 

Use of defined flanking primers and staggered extension 
to recombine pNB esterase mutants 

The two-primer recombination method used here for pNB esterase is 

analogous to that described in £xamp]e 1 for subtilisin E. Two template pNB 

esterase mutant genes that differ at 14 bases are used. Both templates (61C7 

and 404) are used in the plasmid form. Both target genes are present in the 

extension reaction at a concentration of 1 ng/ul. Flanking primers (RMIA and 

RM2A, Table 2) are added at a fmal concentration of 2 ng/ul (approximately 

200-fold molar excess over template). 

TABLE 2 

Primers used in the recombination of the pNB esterase genes 
Primer Sequence 

RMIA GAG CAC ATC AOA TCT ATT AAC (SEQ. ID, NO: 3) 

RM2A GGA GTG GCT CAC AGT CGG TOG {SEQ. ID. NO: 4) 

Clone 6107 was isolated based on its activity in organic solvent and 
contains 13 DNA mutations vs. the wild-type sequence. Clone 4G4 was 
isolated for thermostability and contains 17 DNA mutations when compared 
with wild-type. Eight mutations are shared between themi due to common 
ancestry. The gene product from 4G4 is significantly more thermostable than 
the gene product from 61C7. Thus, one measure of recombination between 
the genes is the co-segregation of the high solvent activity and high 
thermostability or the loss of both properties in the recombined genes. In 
addition, recombination frequency and mutagenic rate can be ascertained by 
sequencing random clones. 

For the pNB esterase gene, primer extension proceeds through 90 
rounds of extension with a thermocycle consisting of 30 seconds at 94''C 
followed by 15 seconds at 55'C. Aliquots (10 m1) arc removed following cycle 
20, 40, 60, 70, 80 and 90. Agarose gel electrophoresis reveals the formation 
of a low molecular weight *smear' by cycle 20. which increases in average size 
and overall intensity at each successive sample point. By cycle 90, a 



Printed from Mimosa 05/04/2000 



wo 98/42728 



-20- 



PCTAJS98A)S814 



pronounced smear is evident extending from 0.5 kb to 4 kb, and exhibiting 
maximal signal intensity at a si2e of approximately 2 kb (the length of the full 
length genes). The jump from half-length to full length genes appears to occur 
between cycles 60 and 70. 

The intense smear is amplified through 6 cycles of polymerase chain 
reaction to more clearly define the full length recombined gene population. A 
minus-primer control is also amplified with flanking primers to determine the 
background due to residual template in the reaction mix. Band intensity from 
the primer extended gene poptilation exceeds that of the control by greater 
than 10-fold, indicating that amplified, non-recombined template comprise 
only a small fraction of the amplified gene population. 

The amplified recombined gene pool is digested with restriction 
enzymes Xbal and BamHI and Ugated into the pNB106R expression vector 
described by Zock et al. (35). Transformation of ligated DNA into E. coli strain 
TGI is done using the well characterized calcium chloride transformation 
procedure. Transformed colonies are selected on LB/agar plates containing 
20 Jig/ ml tetracycline. 

The mutagenic rate of the process is determined by measuring the 
percent of clones expressing an active esterase (20). In addition, colonies 
picked at random are sequenced and used to define the mutagenic frequency 
of the method and the efficiency of recombination. 

EXAMPLES 

Recombination of pNB esterase genes aslng interspersed 
internal defined primers and staggered eictension 

This example demonstrates that the interspersed defined primer 

recombination technique can produce novel sequences throu^ point 

mutagenesis and recombination of mutations present in the parent 

sequences. 

Experimental design and background information 

Two pNB esterase genes (2-13 and 5-B12) were recombined using the 
defined primer recombination technique. Gene products from both 2-13 and 
5-B12 arc measurably more thermostable than wild-type. Gene 2-13 contains 
9 mutations not originally present in, the wild- type sequence, while gene 5- 
B12 contains 14. The positions at which these two genes differ from one 
another are shown in FIG. 5. 
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Table 3 shows the sequences of the eight primers used in this example. 
Location (at the 5' end of the template gene) of oligo annealing to the template 
genes is indicated in the table, as is primer orientation (P indicates a forward 
primer, R indicates reverse). These primers are shown as arrows along gene 
5 2-13 in FIG. 5. 

TABLE 3 

Sequences of primers used In this example 



name 


orientation 


location 


sequeBce 


RMIA 


F 


-76 


GAGCACATCAGATCTATTAAC (SEQ. ID. NO: 3) 


RM2A 


R 


+454 


GGAGTOGCTCACAGTCGQTGG {SEQ. ID. NO: 4) 


S2 


F 


400 


TTGAACTATCGGCTGGGGCGG (SEQ. ID. NO: 5) 


S5 


F 


1000 


TTACTAGGGAAGCCGCTGGCA (SEQ. ID. NO: 6) 


S7 


F 


1400 


TCAGAGATTACGATCGAAAAC (SEQ. ID, NO: 7) 


S6 


R 


1280 


GGATTGTATCGTPGTGAGAAAG (SEQ. ID. NO: 8) 


SIO 


R 


880 


AATGCCGGAAGCAGCCCCTTC (SEQ. ID. NO: 9) 


S13 


R 


280 


CACGACAGGAAGATTTTGACT (SEQ. ID. NO: 10) 



10 Materials and Methods 

Defined'primer b<ised rBcombination 

1. Preparation of genes to be recombined. Plasmids containing the 
genes to be recombined were purified from transformed TGI cells using the 
Qiaprep kit (Qiagen, Chatsworth, CA). Plasmids were quantitated by UV 

15 absorption and mixed 1 :1 for a final concentration of 50 ng/id. 

2. Staggered extension PGR and reassembly. 4 ^1 of the plasmid 
mixture was used as template in a 100 ^il standard reaction (1.5 mM MgCb, 
50 mM KCl, 10 mM Tris-HCl pH 9.0, 0.1% Triton X-100, 0.2 mM dNTPs, 0.25 
U Ttiq polymerase (Promega, Madison, WI)) which also contained 12.5 ng of 

20 each of the 8 primers. A control reaction which contained no primers was also 

assembled. Reactions were thermocyclcd through 100 cycles of 94'C, 30 
seconds; 55**C, 15 seconds. Checking an aliquot of the reaction on an agarose 
gel at this point showed the product to be a large smear (with no visible 
product in the no primer control) . 

25 3. Dpnl digestion of the templates. 1 ^1 from the assembly 

reactions was then digested with Dpnl to remove the template plasmid. The 
10 \d Dpnl digest contained 1 x NEBuffer 4 and 5 U Dpnl (both obtained from 
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New England Biolabs, Beverly^ MA) and was incubated at 37*C for 45 minutes, 
followed by incubation at 70*C for 10 minutes to heat kill the enzyme. 

4. PGR amplification of the reassembled products. The 10 ^1 digest 
was then added to 90 ^il of a standard PGR reaction (as described in step 2) 
containing 0.4 ^M primers 5b (ACTTAATCTAGAGGGTATTA) (SEQ. ID. NO: 11) 
and 3b (AGCCTCGCGGGATCCCCGGG) (SEQ. ID. NO: 12) specific for the ends 
of the gene. After 20 cycles of standard PGR (94'»C, 30 seconds; 48°C, 30 
seconds, 72'*C, 1 minute) a strong band of the correct size (2 kb) was visible 
when the reaction was checked on an agarose gel, while only a very faint band 
was visible in the lane from the no-primer control. The product band was 
purified and cloned back into the expression plasmid pNB106R and 
transformed by electroporation into TGI cells. 
Results 

Four 96 well plates of colonies resulting from this transformation were 
assayed for pNB esterase initial activity and thermostability. Approximately 
60% of the clones exhibited initial activity and thcrmostabilty within 20% of 
the parental gene values. Very few (10%) of the clones were inactive (less than 
10% of parent initial activity vahies). These results suggest a low rate of 
mutagenesis. Four mutants with the highest thermostability values were 
sequenced. Two clones (6E6 and 6H1) were the result of recombination 
between the parental genes (FIG. 5). One of the remaining two clones 
contained a novel point mutation, and one showed no diiference from parent 
5B12. The combination of mutations T99C and C204T in mutant 6E6 is 
evidence for a recombination event between these two sites. In addition, 
mutant 6H1 shows the loss of mutation A1072G (but the retention of 
mutations C1038T and T1310C), which is evidence for two recombination 
events (one between sites 1028 and 1072. and another between 1072 and 
1310). A total of five new point mutations were found in the four genes 
sequenced. 

EXAMPLE 4 

Recombination of two thermostable subtUisin E varianto 
using internal defined piimers and staggered extension 

This example demonstrates that the defined primer recombination 

technique can produce novel sequences containing new combinations of 

mutations present in the parent sequences. It further demonstrates the utility 

of the defined primer recombination technique to obtain further improvements 
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in en^mie performance (here, thermostability). This example further shows 
that the defined primers can bias the recombination so that recombination 
appears most often in the portion of the sequence defined by the primers 
(inside the primers). Furthermore, this example shows that specific mutations 
can be introduced into the recombined sequences by using the appropriate 
defined primer 8equence(s) containing the desired mutation(8). 

Genes encoding two thermostable subtilisin E variants of Example 1 
(Rl and R2) were recombined using the defined primer recombination 
procedure with internal primers. FIG. 6 shows the four defined internal 
primers used to generate recombined progeny genes from template genes Rl 
and R2 in this example. Primer P50F contains a mutation (A-fT at base 
position 598) which eliminates a Hindlll restriction site and simultaneously 
adds a new unique Nhel site. This primer is used to demonstrate that specific 
mutations can also be introduced into the population of recombined 
sequences by specific design of the defined primer. Gene R2 also contains a 
mutation A->G at the same base position, which eliminates the HindlH site. 
Thus restriction analysis (cutting by Nhel and Hindlll) of random clones 
sampled from the recombined library will indicate the efficiency of 
recombination and of the introduction of a specific mutation via the mutagenic 
primer. Sequence analysis of randoms-picked (unscreened) clones provides 
further information on the recombination and mutagenesis events occurring 
during defined primer-based recombination. 
Materials and Methods 
Defined-primer based recombination 

A version of the defined primer based recombination illustrated in FIG, 
2 was carried out with the addition of StEP. 

1. Preparation of genes to be recombined. About 10 ug of plasmids 
containing Rl and R2 gene were digested at ST'C for 1 hour with Ndel and 
BamHI (30 U each) in 50 pi of Ix buffer B (Boehringer Mannheim, 
Indianapolis, IN). Inserts of approximately 1 kb were purified from 0.8% 
preparative agarose gels using QIAEX II gel extraction kit. The DNA inserts 
were dissolved in 10 mM Tris-HCl (pH 7.4). The DNA concentrations were 
estimated, and the inserts were nuxed 1:1 for a .concentration of 50 ng/ul. 

2. Staggered extension PGR and reassembly. Conditions (100 ul 
final volume): about 100 ng inserts were used as template, 50 ng of each of 4 
internal primers, Ix Taq buffer, 0.2 mM of each dNTP, 1.5 mM MgCh and .25 
U Taq polymerase. Program: 7 cycles of 30 seconds at 94*C, IS seconds at 
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55'C, foUowed by another 10 cycles of 30 seconds at 94'C, 15 seconds at 
55'C, 5 seconds at 72"C (staggered extension), foUowed by 53 cycles of 30 
seconds at 94'C, 15 seconds at 55*C, 1 minute at 72°C (gene assembly). 

3. Dprd digestion of the templates. 1 \i\ of this reaction was diluted 
5 up to 9.5 Ml with dHaO and 0.5 \i\ of DprH restriction enzyme was added to 

digest the DNA template for 45 minutes, followed by incubation at 70*C for 10 
minutes and then this 10 ul was used as template in a 10-cycle PGR reaction. 

4. PGR amplification of reassembled products. PCR conditions 
(100 ul final volume): 30 pmol of each outside primer P5N and P3B, Ix Taq 

.0 buffer, 0.2 mM of each dNTP and 2.5 U of Taq polymerase. PCR program: 10 

cycles of 30 seconds at 94'C, 30 seconds at 55X, 1 minute at 72''C. This 
program gave a single band at the correct size. The product was purified and 
subdoned into pBE3 shutUe vector. This gene Ubraiy was amplified in £. coli 
HBlOl and transferred into B. subtUis DB428 competent cells for expression 

15 and screening, as described elsewhere (35). Thermostability of en^e 

variants was determined in the 96-well plate format described previous^ (33). 

DNA sequencing 

Ten £. coH HBlOl transformants were chosen for sequencing. Genes 
20 were purified using QIAprep spin plasmid miniprep kit to obtain sequencing 

quality DNA. Sequencing was done on an ABI 373 DNA Sequencing System 
using the Dye Terminator Cycle Sequencing kit (Perkin-Elmer. Branchburg, 
NJ). 

Results 

25 I) restriction analysis: 

Forty clones randomly picked from the recombined library were 
digested with restriction enzymes iVhel and BamHl. In a separate experiment 
the same forty plasmids were digested with HmdHI and BamHI. These 
reaction products were analyzed by gel electrophoresis. As shown in FIG. 7, 

30 eight out of 40 clones (approximately 20%) contain the newly introduced Nhel 

restriction site, demonstrating tiiat the mutagenic primer has indeed been 
able to introduce the specified mutation into the population. 



2) DNA sequence analysis 

The first ten randomly picked clones were subjected to sequence 
analysis, and the results are summarized in FIG. 8. A minimum of 6 out of 
the 10 genes have xaidergone recombination. Among these 6 genes, the 
minimal crossover events (recombination) between genes Rl and R2 vary from 
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1 to 4. All visible crossovers occurred within the region defined by the four 
primers. Mutations outside this region are rarely, if ever, recombined, as 
shown by the fact that there is no recombination between the two mutations 
at base positions 484 and 520. These results show that the defined primers 
can bias recombination so that it appears most often in the portion of the 
sequence defined by the primers (inside the primers). Mutations very close 
together also tend to remain together (for example, base substitutions 731 and 
745 and base substitutions 1141 and 1153 always remain as a pair). 
However, the sequence of clone 7 shows that two mutations as dose as 33 
bases apart can be recombined (base position at 1 107 and 1 141). 

Twenty-three new point mutations were introduced in the ten genes 
during the process. This error rate of 0.23% corresponds to 2-3 new point 
mutations per gene, which is a rate that has been determined optimal for 
generating mutant libraries for directed ens^e evolution (15). The mutation 
types are listed in Table 4. Mutations are mainly transitions and are evenly 
distributed along the gene. 



TABLB4 

New point mntations Identlfled in ten recombined genes 



Transition 


Prequen^r 


Transversion 


Frequency 


G-+ A 


4 


A->T 


1 


A-*G 


4 


A->C 


1 


C->T 


3 


C-> A 


1 


T->C 


5 




0 






G-^C 


1 






G-»T 


0 






T-> A 


3 






T->G 


0 



A total of 9860 bases were sequenced. The mutation rate was 0.23% 



4) Phenotypic analysis 

Approximately 450 B. subtiUs DB428 clones were picked and grown in 
SG medium supplemented with 20 ug/ml kanamydn in 96- well plates. 
Approximately 56% of the clones. eiq>ressed active en^mes. From previous 
experience, we know that this level of inactivation indicates a mutation rate on 
the order of 2-3 mutations per gene (35). Approximately 5% clones showed 
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double mutant (N181D+N218S)-like phenotypes (which is below the expected 

25% value for random recombination alone due primarily to point 
mutagenesis). (DNA sequencing showed that two clones, 7 and 8, from the ten 
randomly picked clones contain both N218S and N181D mutations.) 

5 

EXABIPLE5 

Optimisation of the ActinopUmes utahensis ECB deacylase by 
the random-primiiig recombination method 

In this example, the method is used to generate short DNA fragments 

10 from denatured, linear, double-stranded DNA (e.g., restriction fragments 

purified by gel electrophoresis; 22). The purified DNA, muced with a molar 

excess of primers, is denatured by boiling, and synthesis is then carried out 

using the Klenow fragment of B. coU DNA polymerase I. This enqone lacks 

5'->3' exonuclease activity, so that the random priming product is synthesized 

15 exclusively by primer extension and is not degraded by exonuclease. The 

reaction is carried out at pH 6.6, where the 3'-*5' exonuclease activity of tiie 

enzyme is much reduced (36). These conditions favor random initiation of 

synthesis. 

The procedure involves the following steps: 
20 1. Cleave the DNA of interest with appropriate restriction 

endonuclease(s) and purify the DNA fragment of interest by gel electrophoresis 
using Wizard PGR Prep Kit (Promcga, Madison, WI). As an example, the 
Actinoplanes utahensis ECB deacylase gene was cleaved as a 2.4 kb-long Xho 
l-Psh AI fragment from the recombinant plasmid pSHPlOO. It was essential to 
25 linearize tiie DNA for the subsequent denaturation step. The fragment was 

purified by agarose gel electrophoresis using the Wizard PCR Prep Kit 
(Promega, Madison, WI) (FIG.9, step (a)). Gel purification was also essential in 
order to remove the restriction endonuclease buffer from the DNA, since the 
Mg^"*" ions make it difficult to denature the DNA in the next step. 
3Q 2. 400 ng (about 0.51 pmol) of the double- stranded DNA dissoWed 

in H2O was mbced with 2.75 \ig {about 1.39 nmol) of dp(N)6 random primers. 
After immersion in boiling water for 3 minutes, tiie mbrture was placed 
immediately in an ice/cthanol bath. 

The size of the random priming products is an inverse function of the 
35 concentration of primer (33). The presence of high concentrations of primer is 

thought to lead to steric hindrance. Under tiie reaction conditions described 
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here the random priming products are approximately 200-400 bp, as 
determined by electrophoresis through an alkaline agarose gel (FIG. 9 step b). 

3. Ten nl of 10 x reaction buffer [lOX buffer: 900 mM HEPES, pH 
6.6; 0.1 M magnesium chloride, 10 mM dithiothreitol. and 5 mM each dATP, 
dCTP, dGTP and dTTP) was added to the denatured sample, and the total 
volume of the reaction mixture was brought up to 95 m1 with HaO. 

4. Ten units (about 5 of the KJenow fragment of B.coU DNA 
polymerase I was added. All the components were mixed by gently tapping the 
outside of the tube and were centrifuged at 12,000 g for 1-2 seconds in a 
microfuge to move all the liquid to the bottom. The reaction was carried out at 
22**C for 35 minutes. 

The rate of the extension depends upon the concentrations of the 
template and the four nucleotide precursors. Because the reaction was 
carried out under conditions that minimize exonucleolytic digestion, the newly 
synthesized products were not degraded to a detectable extent. 

5. After 35 minutes at 22*'C, the reaction was terminated by cooling 
the sample to 0*»C on ice. 100 m1 of ice-cold H2O was added to the reaction 
mixture. 

6. The random primed products were purified by passing the whole 
reaction mbrture through Centricon-100 (to remove the template and proteins) 
and Centricon-10 filters (to remove the primers and fragments less than 50 
bases), successively. Centricon filters are available from Amicon Inc (Berverly, 
MA). The rctcntate fraction (about 85 m1 in volume) was recovered from 
Centricon-10. This fraction contained the desired random priming products 
(FIG. 9, step c) and was used for whole gene reassembly. 

Reassembly of the whole gene was accomplished by the following steps: 

1. For reassembly by PCR, 5 pi of the random-primed DNA 
fragments from Centricon-10. 20 mI of 2x PCR pre-mix (5-fold diluted cloned 
PJu buffer, 0.5 mM each dNTP, O.lU/nl cloned PfU pofymerase (Stratagene, La 
Jolla, CA)), 8 III of 30% (v/v) glycerol and 7 jil of H2O were muced on ice. Since 
the concentration of the random-primed DNA fragments used for reassembly 
is the most important variable, it is useful to set up several separate reactions 
with different concentrations to establish the preferred concentration. 

2. After incubation at 96'*C for 6 minutes, 40 thermocycles were 
performed, each with 1.5 minutes at 95*C, 1.0 minutes at 55*'C and 1.5 
minutes + 5 second/cycle at 72°C, with the extension step of the last cycle 
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proceeding at 72*C for 10 minutes, in a DNA Engine PTC-200 (MJ Research 
Inc., Watertown, MA) apparatus without adding any mineral oil. 

3. 3 ^l aliquots at cycles 20, 30 and 40 were removed from the 
reaction mixture and analyzed by agarose gel electrophoresis. The 
5 reassembled PGR product at 40 cycles contained the correct size product in a 

smear of larger and smaller sizes (see FIG. 9, step d). 

The correctly reassembled product of this first PGR was further 
amplified in a second PGR reaction which contained the PGR primers 
Lo complementary to the ends of the template DNA. The amplification procedure 

was as follows: 

1. 2.0 jil of the PGR reassembly aliquots were used as template in 
100-^1 standard PGR reactions, which contained 0.2 mM each primers of 
xhoF28 (5* GGTAGAGGGAGTGTCGAGGGGGAGATGG31 (SEQ. ID. NO: 13) and 

15 pshR22 (5* AGGGGGCGTGAGGTGGGTGAGC 3') (SEQ. ID. NO: 14), 1.5 mM 

MgGh, 10 mM Tris-HCl (pH 9.0], 50 mM KCl, 200 \iU each of the four dNTPs, 
6% (v/v) glycerol, 2.5 U of Tag polymerase (Promega, Madison, WI) and 2.5 U 
of Pfu polymerase (Stratagene, La Jolla, GA). 

2. After incubation at 96*»C for 5 minutes, 15 thermocycles were 
20 performed, each with 1.5 minutes at 95*^0, 1.0 minutes at 55*C and 1.5 

minutes at 72*C. followed by additional 15 thermocycles of 1.5 minutes at 
95''G, 1.0 minutes at 55*C and 1.5 minutes + 5 second/ cycle at 72«G with the 
extension step of the last cycle proceeding at 72»G for 10 minutes, in a DNA 
Engine PTC-200 (MJ Research Inc., Watertown, MA) apparatus without adding 
25 any mineral oil. 

3. The amplification resulted in a large amount of PGR product 
with the correct size of the ECB deacylase whole gene (FIG. 9, step e). 

Gloning was accomplished as follows: 
30 1. The PGR product of EGB deacylase gene was digested with Xho I 

and Psh Al restriction enzymes, and cloned into a modified pIJ702 vector. 

2. S. Uuidans TK23 protoplasts were transformed with the above 
ligation mixture to form a mutant library. 

35 M situ scree ning the EGB deacvlase mutants 

Each transformant within the S. Uvidans TK23 library obtained as 
described above was screened for deacylase activity with an in situ plate assay 
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method using £CB as substrate. Transformed protoplasts were allowed to 
regenerate on R2YE agar plates by incubation at SO^C for 24 hours and to 
develop in the presence of thiostrepton for further 48-72 hours. When the 
colonies grew to proper size, 6 ml of 45*C purified-agarose (Sigma) solution 
5 containing 0.5 mg/ml ECB in 0.1 M sodium acetate buffer (pH 5.5) was 

poured on top of each R2YE-agar plate and allowed to further develop for 18- 
24 hours at ST^'C. Colonies surrounded by a clearing zone larger than that of a 
control colony containing wUd-type recombinant plasmid pSHP150-2 were 
indicative of more efficient ECB hydrolysis resulting from improved enzyme 
10 properties or improved enzyme expression and secretion level, and were 

chosen as potential positive mutants. These colonies were picked for 
subsequent preservation and manipulation. 

HPLC assay of the ECB deacvlase mutants 

15 Single positive transformants were inoculated into 20 ml fermentation 

medium containing 5 jig/ml thiostrepton and allowed to grow at 30''C for 48 
hours. At this step, all cultures were subjected to HPLC assay using ECB as 
substrate. 100 pi of whole broth was used for an HPLC reaction at SO'^C for 30 
minutes in the presence of 0.1 M NaAc (pH 5.5), 10% (v/v) MeOH and 200 

20 Ug/ml of ECB substrate. 20 ^l of each reaction mixture was loaded onto a 

PolyLC polyhydroxyethyl aspartamide column (4.6 x 100 mm) and eluted by 
acetonitrile gradient at a flow rate of 2.2 ml/min. The ECB-nucleus was 
detected at 225 nm. 

25 Purification of the ECB deacvlase mutants 

After the HPLC assay, 2.0 ml pre-cultures of all potential positive 
mutants were then used to inoculate 50-ml fermentation medium and allowed 
to grow at aO'C, 280 rpm for 96 hours. These 50-ml cultures were then 
centrifuged at 7,000 g for 10 minutes. The supematants were re-centrifiiged 

30 at 16,000 g for 20 minutes. The supematants containing the ECB deacylase 

mutant enzymes were stored at -20'*C. 

The supematants from the positive mutants were further concentrated 
to 1/30 their original volume with an Amicon filtration unit with molecular 
weight cutoff of 10 kD. The resulting enzyme samples were diluted with an 

35 equal volume of 50 mM KHsPOa (pH 6.0) buffer and 1.0 ml was applied to Hi- 

Trap ion exchange column. The binding buffer was 50 mM KH2P04 (pH 6.0), 
and the elution buffer was 50 mM KH2PO4 (pH 6.0) and 1.0 M NaCl. A linear 
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gradient from 0 to 1.0 M NaCl was applied in 8 column volumes with a flow 
rate of 2.7 ml/min. The ECB deacylase mutant fraction elutcd at 0.3 M NaCl 
and was concentrated and buffer exchanged into 50 mM KH2PO4 (pH 6.0) in 
Amicon Centricon-10 units. Enzyme purity was verified by SDS-PAGE, and 
5 the concentration was determined using the Bio-Rad Protein Assay. 

Specific activity assay of the ECB deacvlase mutants 

4.0 of each purified ECB deacylase mutant was used for the activity 
assay at 30»C for 0-60 minutes in the presence of 0.1 M NaAc (pH 5.5), 10% 
10 (v/v) MeOH and 200 »ig/ml of ECB substrate. 20 ^1 of each reaction mixture 

was loaded onto a PolyLC polyhydroxyethyl aspartamide column (4.6 x 100 
mm) and eluted with an acetonitrile gradient at a flow rate of 2.2 ml/min. The 
reaction products were monitored at 225 nm and recorded on an IBM PC data 
acquisition system. The ECB nucleus peak was numerically integrated and 
15 used to calculate the spedfic activity of each mutant. 

As shown in FIG. 10, after only one round of applying this random- 
priming based technique on the wild-type ECB deacylase gene, one mutant 
(M16) from 2,012 original transformants was found to possess 2.4 times the 
specific activity of the wild-type cnayme. FIG 11 shows that the activity of 
20 M16 has been increased relative to that of the wild-type enzyme over a broad 

pH range. 

The nucleotide sequence which encodes the M16 mutant gene is set 
forth in SEQ. ID. NO: 26. The nucleotide sequence for the wild-type ECB 
deacylase gene is set forth in SEQ. ID. NO: 31. 

25 Other mutant genes which were isolated utilizing the above method 

include mutant M2#7, M2#14, M15 and M20. The nucleotide sequences for 
these mutant genes are set forth in SEQ. ID. NOS: 27, 28, 29 and 30, respec- 
tively. The amino acid sequences for the ECB deacylases encoded by the 
mutant genes are set forth in SEQ. ID. NOS: 32 (M16); 33 (M2#7); 34 (M2#14); 

30 35 (M15); and 36 (M20). 

The above-identified mutant genes may be ligated into a suitable 
expression vector and incorporated into a host cell or organism for expression. 
The resulting ECB deacjdase enzyme which is expressed by the transformed 
host cell or organism may be isolated, purified and used as an enzyme in a 

35 wild variety of synthetic protocols which require the ECB deacylase entqrme. 

Alternatively, the transformed host cell or organism may be incorporated 
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directly into suitable production broths where the ECB deac^lase enzyme is 
generated in situ by the transfonnant. 

EXAMPLES 

Improving the thermostability BaciUua subtilis subtlUsin E 
using the random-sequence primer recombination method 

This example demonstrates the use of various DNA polymerases for 
primer-based recombination. It further demonstrates the stabilization of 
subtilisin E by recombination. 

Genes Rl and R2 encoding the two thermostable subtilisin E variants 
described in Example 1 were chosen as the templates for recombination. 

(1) Target gene preparation 

Subtilisin E thermostable mutant genes Rl and R2 (PIG. II) were 
subjected to random primed DNA synthesis. The 986-bp fragment including 
45 nt of subtilisin E prosequence, the entire mature sequence and 1 13 nt after 
the stop codon were obtained by double digestion of plasmid pBE3 with Bam 
HI and Nde \ and purified from a 0.8% agarose gel using the Wizard PGR Prep 
Kit (Promega, Madison, WI). It was essential to linearize the DNA for the 
subsequent denaturation step. Qel purification weis also essential in order to 

remove the restriction endonuclease buffer from the DNA, since the Mg2+ ions 

make it difficult to denature the DNA in the next step. 

(2) Random primed DNA synthesis 

Random primed DNA synthesis used to generate short DNA fragments 
from denatured, linear, double-stranded DNA. The purified B. subtilis 
subtilisin E mutant genes, mixed with a molar excess of primers, were 
denatured by boiling, and synthesis was then carried out using one of the 
following DNA polymerases: the Klenow fragment of E. coU DNA polymerase 1, 
bacteriophage T4 DNA polymerase and T7 sequenase version 2.0 DNA 
polymerase. 

Under its optimal performance conditions (29), bacteriophage T4 DNA 
polymerase ^ves similar synthesis results as the Klenow fragment does. When 
T7 sequenase version 2.0 DNA polymerase (31, 32) is used, the lengths of the 
synthesized DNA fragments are usually larger. Some amount of MnCb has to 
be included during the synthesis in order to control the lengths of the 
synthesized fragments within 50-400 bases. 



Printed from Mimosa 05/04/2000 



wo 98/42728 -32- PCT/US98/05814 

Short, nascent DNA fragments can also be generated with PGR using 
the Stoffel fragment of Taq DNA polymerase or Pfii DNA polymerase. An 
important consideration is to identify by routine experimentation the reaction 
conditions which ensure that the short random primers can anneal to the 
5 templates and give sufficient DNA amplification at hi^er temperatures. We 

have found that random primers as short as dp(N)i3 can be used with PGR to 
generate fragments. 

2. 1 Random primed DNA synthesis with the Klenow fragment 
10 The Klenow fragment of E. coU DNA polymerase I lacks 5'->3' 

exonuclease activity, so that the random priming product is synthesized 
exclusively by primer extension and is not degraded by exonuclease. The 
reaction was carried out at pH 6.6, where the 3'-*5' exonuclease activity of the 
en^rme is much reduced (36). These conditions favor random initiation of 
15 synthesis. 

1 . 200 ng (about 0.7 pmol) of Rl DNA and equal amount of R2 DNA 

dissolved in H2O was mbced with 13.25 ng (about 6.7 nmol) of dp(N)6 random 
primers. After immersion in boiling water for 5 minutes, the nuxture was 
placed immediately in an ice/ethanol bath. 
20 The size of the random priming products is an mverse function of the 

concentration of primer (30). The presence of high concentrations of primer is 
thought to lead to steric hindrance. Under the reaction conditions described 
here the random priming products are approximately 50-500 bp, as 
determined by agarose gel electrophoresis. 
25 2. Ten m1 of 10 x reaction buffer [lOx buffer: 900 mM HEPES, pH 

6.6; O.l M magnesium chloride, 20 mM dithiothreitol, and 5 mM each dATP, 
dCTP, dGTP and dTTP) was added to the denatured sample, and the total 
volume of the reaction mixture was brought up to 95 pi with H2O. 

3. Ten units (about 5 m1) of the Klenow fragment of B,coti DNA 
30 polymerase I (Boehringer Mannheim, Indianapolis, IN) was added. All the 

components were mbced by gently tapping the outside of the tube and were 
centrifuged at 12,000 g for 1-2 seconds in a microfuge to move all the liquid to 
the bottom. The reaction was carried out at 22*C for 3 hours. 

The rate of the extension depends upon the concentrations of the 
35 template and the four nucleotide precursors. Because the reaction was carried 

out under conditions that minimize exonucleolytic digestion, the newly 
synthesized products were not degraded to a detectable extent. 
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4. After 3 hours at 22'*C, the reaction was terminated by cooling 
the sample to O^C on ice. 100 |il of ice-cold H2O was added to the reaction 
mixture. 

5. The random primed products were purifLed by passing the whole 
5 reaction mixture through Microcon-100 (Amicon, Beverly MA) (to remove the 

template and proteins) and Microcon-10 filters (to remove the primers and 
fragments less than 40 bases), successively. The retentate fraction (about 65 
)il in volume) was recovered from the Mlcrocon-10. This fraction containing the 
desired random priming products was buffer-exchanged against PGR reaction 
10 buffer with the new Microcon- 10 further use in whole gene reassembly. 

2.2 Random primed DNA synthesis with bacteriophage T4 DNA polymerase 

Bacteriophage T4 DNA polymerase and the Klenow fragment of B.coli 
DNA polymerase I are simUar in that each possesses a 5'-3' polymerase 

15 activity and a 3'- 5' exonuclease activity. The exonucleases activity of 

bacteriophage T4 DNA polymerase is more than 200 times that of the Klenow 
fragment. Since it does not displace the short oligonucleotide primers from 
single-stranded DNA templates (23), the efficiency of mutagenesis is different 
from the Klenow fragment. 

20 1 . 200 ng (about 0.7 pmol) of Rl DNA and equal amount of R2 DNA 

dissolved in H2O was mixed with 13.25 Mg (about 6.7 nmol) of dp(N)6 random 
primers. After immersion in boiling water for 5 minutes, the mixture was 
placed immediately in an ice/ethanol bath. The presence of hi^ concentra- 
tions of primer is thought to lead to steric hindrance. 

25 2. Ten nl of 10 x reaction buffer |10x buffer: 500 mM Tris-HCl, pH 

8.8; 150 mM (NH4)2S04; 70 mM magnesium chloride, 100 mM 2- 
mercaptoethanol, 0.2 mg/ml bovine scrum albumin and 2 mM each dATP, 
dCTP. dGTP and dTTP) was added to the denatured sample, and the total 
volume of the reaction mixture was brought up to 90 ui with H2O. 

30 3. Ten units (about 10 ^1) of the T4 DNA polymerase I (Boehringer 

Mannheim, Indianapolis, IN) was added. All the components were mixed by 
gently tapping the outside of the tube and were centrifuged at 12,000 g for 1-2 
seconds in a microfuge to move all the liquid to the bottom. The reaction was 
carried out at 37''C for 30 minutes. Under the reaction conditions described 

35 here the random priming products are approximately 50-500 bp. 
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4. After 30 minutes at 37**C, the reaction was terminated by cooling 
the sample to 0°C on ice. 100 nl of ice-cold HqO was added to the reaction 
mixture. 

5. The random primed products were purified by passing the whole 
reaction mixture through Microcon-100 (to remove the template and proteins) 
and Microcon-10 filters (to remove the primers and fragments less than 40 
bases), successively. The retentate fraction (about 65 ul in volume) was 
recovered from the Microcon-10. This fraction containing the desired random 
priming products was buffer-exchanged against PGR reaction buffer with the 
new MicroGon-10 further use in whole gene reassembly. 

2.3 Random primed DNA synthesis with the T7 sequenase v2.0 DNA 
polymerase 

Since the T7 sequenase v2.0 DNA polymerase lacks exonuclease 
activity and is highly processive, the average length of DNA synthesized is 
greater than that of DNAs synthesized by the Klenow fragment or T4 DNA 
polymerase. But in the presence of proper amount of MnCb in the reaction, 
the size of the synthesized fragments can be controlled to less than 400 bps. 

1 . 200 ng {about 0.7 pmol) of Rl DNA and equal amount of R2 DNA 
dissolved in HaO was mixed with 13.25 Mg (about 6.7 nmol) of dp{N)6 random 
primers. After immersion in boiling water for 5 minutes, the nuxture was 
placed immediately in an ice/ethanol bath. The presence of high concen- 
trations of primer is thought to lead to steric hindrance. 

2. Ten ^ll of 10 x reaction buffer [lOX buffer. 400 mM Tris-HCl, pH 
7.5; 200 mM magnesium chloride, 500 mM NaCl, 3 mM MnCh, and 3 mM 
each dATP, dCTTP, dGTP and dTTP) was added to the denatured sample, and 
the total volume of the reaction mucture was brought up to 99.2 m1 with H2O. 

3. Ten units (about 0.8 ^1) of the T7 Sequenase v2.0 (Amersham 
Life Science, Cleveland, Ohio) was added. All the components were mixed by 
gently tapping the outside of the tube and were centrifuged at 12,000 g for 1-2 
seconds in a microfuge to move all the liquid to the bottom. The reaction was 
carried out at 22'*C for 15 minutes. Under the reaction conditions described 
here the random priming products are approximately 50-400 bps. 

4. After 1 5 minutes at 22**C, the reaction was terminated by cooling 
the sample to 0*C on ice. 100 pi of ice-cold H3O was added to the reaction 
mixture. 
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5. The random primed products were purified by passing the whole 
reaction mixture through Microcon-100 (to remove the template and proteins) 
and Microcon-lO filters (to remove the primers and fragments less than 40 
bases), successively. The retenUte fraction {about 65 m1 in volume) was 
recovered from the Microcon-10. This fraction containing the desired random 
priming products was buflfer-exchanged against PCR reaction buffer with the 
new Microcon-10 further use in whole gene reassembly! 

2.4 Random primed DNA synthesis with PCR using the Stoffel fragment of 
Taq DNA polymerase 

Similar to the Klenow fragment of B. coU DNA polymerase I, the Stofifel 
fragment of Taq DNA polymerase lacks 5* to 3' exonuclease activity. It is also 
more thermostable than Taq DNA polymerase. The Stoffel fragment has low 
processivity, extending a primer an average of only 5-10 nucleotides before it 
dissociates. As a result of its lower processivity, it may also have improved 
fidelity. 

1. 50 ng (about 0.175 pmol) of Rl DNA and equal amount of R2 
DNA dissolved in HaO was mbced with 6.13 Mg (about 1.7 nmol) of dp(N)i2 
random primers. 

2. Ten ^\ of lOx reaction pre-mix (lOx reaction pre-mix: 100 mM 
Tris-HCl, pH 8.3; 30 mM magnesium chloride, 100 mM KCl, and 2 mM each 
dATP, dCTP, dGTP and dTTP) was added, and the total volume of the reaction 
mixture was brought up to 99.0 \i\ with H3O. 

3. After incubation at 96«C for 5 minutes, 2.5 units (about 1.0 m1) 
of the Stoffel fragment of Taq DNA polymerase (Perkin-Ehner Corp., Norwalk, 
CT) was added. Thirty-five thermocycles were performed, each with 60 
seconds at 95»C, 60 seconds at 55*'C and 50 seconds at 72''C, without the 
extension step of the last cycle, in a DNA Engine PTC-200 (MJ Research Inc., 
Watertown, MA) apparatus. Under the reaction conditions described here the 
random priming products are approximately 50-500 bp. 

4. The reaction was terminated by cooling the sample to 0*'C on ice. 
100 ill of ice-cold H3O was added to the reaction mixture. 

5. The random primed products were purified by passing the whole 
reaction mixture through Microcon-100 (to remove the template and proteins) 
and Microcon-10 filters (to remove the primers and fragments less than 40 
bases), successively. The retcntate fraction (about 65 nl in volume) was 
recovered from the Microcon-10. This fraction containing the desired random 
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priming products was buffer-exchanged against PGR reaction buffer with the 
new Microcon-10 further use in whole gene reassembly. 

2.5 Random primed DNA synthesis with PGR using PJu DNA polymerase 
5 Pfii DNA polymerase is extremely thermostable, and the enzyme 

possesses an inherent 3' to 5' exonudease activity but does not possess a 
5'-^3* exonuclease activi^. Us base substitution fidelity has been estimated to 

be 2 X 10*^. 

1. 50 ng (about 0.175 pmol) of Rl DNA and equal amount of R2 
10 DNA dissolved in HaO was mixed with 6.13mg (about 1.7 nmol) of dp(N)i2 

random primers. 

2. Fifty pi of 2 X reaction pre^mix [2 x reaction pre-mbc: 5-fold 
diluted cloned Pfu buffer (Stratagene, La Jolla, CA), 0.4 mM each dNTP), was 
added, and the total vohime of the reaction mucture was brought up to 99.0 ^1 

15 with H2O. 

3. After incubation at 96°C for 5 minutes, 2.5 units (about 1.0 pi) 
of PJu DNA polymerase (Stratagene, La JoUa. CA) was added. Thirty-five 
thennocycles were performed, each with 60 seconds at 95"C, 60 seconds at 
55»C and 50 seconds at 72'C. without the extension step of the last cycle, in a 

20 DNA Engine PTC-200 (MJ Research Inc., Watertown, MA) apparatus. Under 

the reaction conditions described here the major random priming products are 
approximately 50-500 bp. 

4. The reaction was terminated by cooling the sample to 0*C on ice. 

100 Ml of ice-cold H2O was added to the reaction mixture. 

25 5. The random primed products were purified by passing the whole 

reaction mixture through Microcon-lOO (to remove the template and proteins) 
and Microcon-lO filters (to remove the primers and fragments less than 40 
bases), successively. The retentatc fraction (about 65 \i\ in volixme) was 
recovered from the Microcon-10. This fraction containing the desired random 

30 priming products was buffer-exchanged against PGR reaction buffer with the 

new Microcon-10 further use in whole gene reassembly. 

(3) Reassembly of the whole gem 

1. For reassembly by PGR, 10 m1 of the random-primed DNA 
35 fragments from Microcon-10, 20 nl of 2 X PGR pre-mix (5-fold diluted doned 
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Pfit buffer, 0.5 mM each dNTP, O.lU/jil cloned PJu polymerase (Stratagene, La 
Jolla, CA)), 15 Ml of H2O were mixed on ice. 

2, After incubation at 96*^C for 3 minutes, 40 thermocycles were 
performed, each with 1.0 minute at 95'C, 1.0 minute at 55*»C and 1.0 minute 
+ 5 second/cycle at 72"C, with the extension step of the last cyde proceeding 
at 72"C for 10 minutes, in a DNA Engine PTC-200 (MJ Research Inc., 
Watertown, MA) apparatus without adding any mineral oil. 

3. 3 nl aliquots at cycles 20, 30 and 40 were removed from the 
reaction mixture and analyzed by agarose gel electrophoresis. The 
reassembled PGR product at 40 cycles contained the correct size product in a 
smear of larger and smaller sizes. 

(4) Amplification 

The correctly reassembled product of this first PGR was further 
amplified in a second PGR reaction which contained the PGR primers 
complementary to the ends of the template DNA. 

1. 2.0 Ml of the PGR reassembly aliquots were used as template in 
100-m1 standard PGR reactions, which contained 0.3 mM each primers of PI 
(5* GGQAGGGTTGG ATATGTGGAAG 31 (SEQ. ID. NO: 15) and P2 (5* 
GGAGTGTAGAGGATCGGATTG 3T (SEQ. ID. NO: 16), 1,5 mM MgGh, 10 mM 
Tris-HCl [pH 9.0], 50 mM KCl, 200 mM each of the four dNTPs, 2.5 U of Tag 
polymerase (Promega, Madison, WI, USA) and 2. S \J of PJu polymerase 
(Stratagene. La Jolla, CA). 

2. After incubation at 96**G for 3 minutes, 15 thermocycles were 
performed, each with 60 seconds at 95^G, 60 seconds at 55*^0 and 50 seconds 
at 72'*G, followed by additional 15 thermocycles of 60 seconds at 95*C, 60 
seconds at 55**C and 50 seconds (+ 5 second/cycle) at 72''C with the extension 
step of the last cycle proceeding at 72''C for 10 minutes, in a DNA Engine PTC- 
200 (MJ Research Inc., Watertown, MA) apparatus without adding any mineral 
oil. 

3. The amplification resulted in a large amount of PGR product 
with the correct size of the subtilisin E whole gene, 

(5) Chning 

Since the short DNA fragments were generated with five different DNA 
polymerases, there were five pools of final PGR amplified reassembled 
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products. Each of the DNA pool was used for constructing the corresponding 
subtilisin E mutant library. 

1. The PGR amplified reassembled product was purified by Wizard 
DNA-CleanUp kit (Promega, Madison, WI), digested with Bam HI and Nde \, 

5 electrophoresed in a 0.8% agarose gel. The 986-bp product was cut from the 

gel and purified by Wizard PGR Prep kit (Promega, Madison, WI). Products 
were ligated with vector generated by Bam Hl-Nde 1 digestion of the pBE3 
shuttle vector. 

2. E, coU HBlOl competent cells were transformed with the above 
10 ligation mixture to form a mutant library. About 4,000 transformants from 

this library were pooled, and recombinant plasmid mixture was isolated from 
this pool. 

3. B. subtilis DB428 competent cells were transformed with the 
above isolated plasmid mixture to form another library of the subtilisin E 

15 variants. 

4. Based on the DNA polymerase used for random priming the 
short, nascent DNA fragments, the five libraries constructed here were named: 
library/Klenow, libraiy/T4, library/ Sequenase, libraiy/StofTel and library/Pfu. 
About 400 tranformants from each library were randomly picked and 

20 subjected to screening for thermostability [see Step (7)]. 

(6) Random done sequencing 

Ten random clones from the B, subtiUs DB428 library/ Klenow was 
chosen for DNA sequence analysis. Recombinant plgismids were individually 

25 purified from B. subtilis DB428 using a QlAprep spin plasmid miniprep kit 

(QIAGEN) with the modification that 2 mg/ml lysozyme was added to PI buffer 
and the cells were incubated for 5 minutes at 37°C, retransformed into 
competent B. coU HB 101 and then purified again using QlAprep spin plasmid 
miniprep kit to obtain sequencing quality DNA. Sequencing was done on an 

30 ABI 373 DNA Sequencing System using the Dye Terminator Cycle Sequencing 

kit (Perkin-Elmer Corp., Norwalk, CT). 

(7) Screening for thermostability 

About 400 transformants from each of the five libraries described at 
35 Step (4) were subjected to screening. Screening was based on the assay 

described previously (33, 35), using sucdnyl-Ala-Ala-Pro-Phe-p-nttroanilide 
(SEQ. ID. NO: 25) as substrate. B. subtiUs DB428 containing the plasmid 
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library were grown on LB/kanamycin (20 ^g/ml) plates. After 18 hours at 
37*C single colonies were picked into 96-well plates containing 100 ^1 
SG/kanamycin medium per well. These plates were shaken and incubated at 
37»C for 24 hours to let the cells to grow to saturation. The cells were spun 
5 down, and the supematants were sampled for the thermostability assay. 

Three replica 96-wcll assay plates were duplicated for each growth plate, with 
each well containing 10 ml of supernatant. The subtilisin activities were then 
measured by adding 100 ml of activity assay solution (0.2 mM succinyl-Ala- 
Ala-Pro-Phe-p-nitroanilide (SEQ. ID. NO: 25), 100 mM Tris-HCl, 10 mM CaCk, 

10 pH 8.0, 37 •€). Reaction velocities were measured at 405 nm over 1.0 min. in 

a ThermoMax microplate reader (Molecular Devices, Sunnyvale CA). Activity 
measured at room temperature was used to calculate the fraction of active 
clones (clones with activity less than 10% of that of wild type were scored as 
inactive). Initial activity (Ai) was measured after incubating one assay plate at 

15 65*C for 10 minutes by immediately adding 100 \i\ of prewarmed (37**C) assay 

solution (0.2mM succinyl-Ala-Ala-Pro-Phe-p-nitroanilide (SEQ. ID. NO: 25), 
100 mM Tris-HCl, pH 8.0, 10 mM CaCh) into each well. Residual activity (Ar) 
was measured after 40 minute incubation. 



20 (8) Sequence Analysis 

After screening, one clone that showed the highest thermostability 
within the 400 transformants from the library/ Klenow was re -streaked on 
LB/kanamycin agar plate, and single colonies derived from this plate were 
inoculated into tube cultures, for glycerol stock and plasmid preparation. The 

25 recombinant plasmid was purified using a QlAprep spin plasmid miniprep kit 

(QIAGEN) with the modification that 2 mg/ml lyso2yme was added to PI buffer 
and the cells were incubated for 5 minutes at 37"C, retransformed into 
competent E, cod'HB 101 and then purified again using QlAprep spin plasmid 
miniprep kit to obtain sequencing quality DNA. Sequencing was done on an 

30 ABI 373 DNA Sequencing System using the Dye Terminator Cycle Sequencing 

kit (Perkin-ELmer Corp., Norwalk, CT). 
Results 

1. Recombination frequency and efficiency associated with the 
random^seouence recombination. 
35 The random primed process was carried out as described above. The 

process is illustrated in FIG. 1. Ten clones from the mutant library/ Klenow 
were selected at random and sequenced. As summarized in FIG. 12 and Table 
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5, all clones were different from the parent genes. The frequency of occurrence 
of a particular point mutation from parent Rl or R2 in the recombined genes 
ranged from 40% to 70%, fluctuating around the expected value of 50%. This 
indicates that the two parent genes have been nearly randomly recombined 
5 with the random primer technique. FIG. 12 also shows that all ten mutations 

can be recombined or dissected, even those that are only 12 bp apart. 

We then estimated the rates of subtilisin thermoinactivation at 65^0 by 
analyzing the 400 random clones from each of the five libraries constructed at 
Step (5). The thermostabilities obtained from one 96'Well plate are shown in 

10 FIG. 13, plotted in descending order. Approximately 21% of the clones 

exhibited thermostability comparable to the mutant with the N181D and 
N218S double mutations. This indicates that the N 18 ID mutation from RC2 
and the N218S mutation from RCl have been randomly recombined. 
Sequence ana^rsis of the clone exhibiting the highest thermostability among 

15 the screened 400 transformants from the libraxy/Klenow showed the mutation 

N181D and N218S did exist. 

2. Frequency of newlv introduced mutations during the random 
priming process . 

20 Approximately 400 transformants from each of the five B.subliUs 

DB428 libraries [see Step (5)] were picked, grown in SG medium 
supplemented with 20 ug/ml kanamycin in 96-well plates and subjected to 
subtilisin £ activity screening. Approximately 77-84% of the clones expressed 
active enzymes, while 1 6-23% of the transformants were inactive, presumably 

25 as a result of newly introduced mutations. From previous experience, we 

know that this rate of inactivation indicates a mutation rate on the order of 1 
to 2 mutations per gene (35). 

As shown in FIG. 12, 18 new point mutations were introduced in the 
process. This error rate of 0.18% corresponds to 1-2 new point mutations per 

30 gene, which is a rate that has been determined from the inactivation curve. 

Mutations are nearly randomly distributed along the gene. 
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TABLB5 

DNA and amino acid residue substitutions in the ten random 
clones from Ubrary/Klenow 



Clone # 


Position 


Base 
Substitution 


Substitution 
Type 


Amino Acid 
Substitution 


Substitution 

Type 


C#l 


839 


A-*C 


tran Fftion 

U GUI O VGA OiUAl 




svnon vmou s 


C#2 


722 


A-»0 


tTAnsition 


Ser— >Ser 


synonyxnous 


C#2 


902 


T-*C 


trsunsitton 


Val-^Val 


synonsnnous 


C#2 


1117 


C-¥Q 


treuisversion 


Ser-»Scr 


synonyxnous 


C#4 


809 


T-^C 


transition 


Asn-^Asn 


synonymous 


C»4 


1098 


G-+C 


transversion 


Gly-»Ala 


non-synonymous 


C#4 


1102 


T-»C 


transition 


Ala-^Ala 


synonymous 


C#6 


653 


C-*A 


transversion 


His-4ile 


non-synonymous 


C#6 


654 


A->T 


transversion 


His-^Ile 


non-synonymous 


C#6 


657 


T->C 


transition 


Val->Ala 


non-synonymous 


C#6 


658 


A-*C 


. transversion 


Val-^Ala 


non- s3nionymou s 


C#6 


1144 


A-^ 


transition 


Ala-^Ala 


^Onymous 


C#6 


1147 


A-fcO 


transition 


Ala-»A]a 


^onymous 


C#7 


478 


T-»C 


transition 


Ue-4lle 


synonymous 


C#9 


731 


A-+0 


transition 


Ala*4Ala 


synonymous 


C#9 


994 


A-*G 


transition 


VaWVal 


synonymous 


C#10 


nil 


A-^ 


tranaitton 


Gly-^ly 


synonymous 


CfflO 


1112 


A-^T 


transversion 


Thr-^Ser 


non-synonymous 



5 The mutation types are listed in TABLE 5. The direction of mutation is 

clearfy nonrandom. For example, A changes more often to G than to either T 
or C. All transitions, and in particular T-C and A-G, occur more often than 
transversion. Some nucleotides arc more mutable than others. One G-^C, one 
C^G and one C-^A transversions were found within the 10 sequenced clones. 

10 These mutations were generated very rarely during the error-prone PGR 

mutagenesis of subtiUsin (37). Random-priming process may allow access to a 
greater range of amino add substitutions than PCR-based point mutagenesis. 

It is interesting to note that a short stretch of 5' C GGT ACG CAT GTA 
GCC GGT ACG 3' (S£Q. ID. NO: 16) at the position 646-667 in parents Rl and 

15 R2 was mutated to 5' C GGT AGO ATT GCC GCC GGT ACG 3* (SEQ. ID. NO: 

17) in random clone C#6. Since the stretch contains two short repeats at the 
both endSi the newly introduced mutations may result from a splipped- strand 
mispairing process instead of point-mutation only process. Since there is no 
frame-shift, this kind of slippage may be useful for domain conversion. 
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3. Comparison of different DNA polY tnerase fidelity in the random- 
priming process . 

During random-priming recombination, homologous DNA sequences 
are nearly randomly recombined and new point mutations are also 
introduced. Though these point mutations may provide useful diversity for 
some in vitro evolution applications, they are problematic recombination of 
beneficial mutations already identified previously, especially when the 
mutation rate is this high. Controlling error rate during random priming 
process is particularly important for successful^ applying this technique to 
solve in vitro evohition problems. By choosing different DNA polymerase and 
modifying the reaction conditions, the random priming molecular breeding 
technique can be adjusted to generate mutant libraries with different error 
rates. 

The Klenow fragment of KcoH DNA polymerase I, bacteriophage T4 DNA 
polymerase, T7 sequenase version 2.0 DNA polymerase, the Stoffel fragment of 
Taq polymerase and PJu polymerase have been tested for the nascent DNA 
fragment synthesis. The activity profiles of the resulting five populations [sec 
Step (5)1 are shown in FIG. 13. To generate these profiles, activities of tiie 
individual clones measured in the 96-well plate screening assay are plotted in 
descending order. The Library/ Stoffel and Ubrary/Klenow contain higher 
percentage of wild-type or inactive subtilisin E clones than that of the 
Ubraiy/Pfia. In all five populations, percentage of the vwld-type and inactive 
clones ranges from 17-30%. 

EXAMPLE? 

Use of defined flanking primers and staggered extension 
to recomblne single stranded DNA 

This example demonstrates the use of the defined primer recombi- 
nation with staggered extension in the recombination of single stranded DNA. 
Method Description 

Single-stranded DNA can be prepared }oy a variety of methods, most 
easily from plasmids using helper phage. Many vectors in current use are 
derived from filamentous phages, such as M13mp derivatives. After trans- 
formation into cells, these vectors can give rise both to new double-stranded 
circles and to single-stranded circles derived from one of the two strands of 
the vector. Single-stranded circles are packaged into phage particles, secreted 
from cells and can be easily purified from the culture supernatant. 
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Two defined primers (for example, hybridizing to 5' and 3' ends of the 
templates) are used here to recombine single stranded genes. Only one of the 
primers is needed before the final PGR amplification. Extended recombination 
primers are first generated by the staggered extension process (StEP), which 
consists of repeating cycles of denaturation followed fay extremely abbreviated 
anneEding/ extension 8t^(s). The extended fragments are then reassembled 
into full-length genes by thermocycling-assisted homologous gene assembly in 
the presence of a DNA polymerase, followed by a gene amplification step. 

The progress of the staggered extension process is monitored by 
removing aliquots (10 ul) from the reaction tube (100 ul starting volume) at 
various time points in the primer extension and separating DNA fragments by 
agarose gel electrophoresis. Evidence of effective primer extension is seen as 
appearsmce of a low molecular weight 'smear* early in the process which 
increases in molecular weight with increasing cycle ntunber. Initial reaction 
conditions are set to allow template denaturation (for example, 94''C-30 
second denaturation) followed by very brief annealing/ extension 8tep(s) (e.g. 
SS^C-l to 15 seconds) repeated through 5-20 cycle increments prior to 
reaction sampling. Typically, 20-200 cycles of staggered extension are 
required to generate single stranded DNA 'smears' corresponding to sizes 
greater than the length of the complete gene. 

The experimental design is as in Example 1. Two thermostable 
subtilisin E mutants Rl and R2 gene are subcloned into vector M13mpl8 by 
restriction digestion with KcoRI and BamHI. Single stranded DNA is prepared 
as described (39). 

Two flanking primer based recombination 

Two defined primers, P5N (5'-CCGAG CGTTG CATAT GT GGA AG-3' 
(SEQ. ID. NO: 18), underlined sequence is Ndel restriction site) and P3B (5*- 
CGACT CTAGA GGATC C GATT C-3' (SEQ. ID. NO: 19), underlined sequence is 
BamHI restriction site), corresponding to 5' and 3' flanking primers, 
respectively, are used for recombination. Conditions (100 ul final volume): 
0.15 pmol single-stranded DNA containing Rl and R2 gene (mixed at 1:1) are 
used as template, 15 pmol of one flanking primer (either P5N or P3B), Ix Taq 
buffer, 0.2 mM of each dNTP, 1.5 mM MgCh and 0.25 U Taq polymerase. 
Program: 5 minutes of 95*0, 80-200 cycles of 30 seconds at 94*C, 5 seconds 
at 55*C. The single-stranded DNA products of correct size (approximately 
Ikb] are cut from 0.8% agarose gel after electrophoresis and purified using 
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QIA£X II gel extraction kit. This purified product is amplified by a 
conventional PGR. Condition (100 ul final volume): 1-10 ng of template, 30 
pmol of each flanking primer, Ix Tag buffer, 0.2 mM of each dNTP, 1.5 mM 
MgCh and 0.25 U Taq polymerase. Program: 5 minutes at 9S^C, 20 cycles of 
5 30 seconds at 94'C, 30 seconds at 55'C, 1 minute at 72'*C. The PGR product 

is purified, digested with Ndel and BamHI and subcloned into pBE3 shuttle 
vector. This gene library is amplified in B. coli HBlOl and transferred into B. 
subtiUs DB428 competent cells for expression and screening, as described 
elsewhere (35). Thermostability of en^me variants is determined in the 96- 

10 well plate format described previously (33). 

This protocol results in the generation of novel sequences containing 
novel combinations of mutations from the parental sequences as well as novel 
point mutations. Screening allows the identification of cruEyme variants that 
are more thermostable than the parent enzymes, as in Example 1. 

15 As is apparent from the above examples, primer-based recombination 

may be used to explore the vast space of potentially useful catalysts for their 
optimal performance in a wide range of applications as well as to develop or 
evolve new enzymes for basic structure-function studies. 

While the present specification describes using DNA-dependent DNA 

20 poljmierase and single-stranded DNA as templates, alternative protocols are 

also feasible for using single-stranded RNA as a template. By using specific 
protein mRNA as the template and RNA-dependent DNA polymerase (reverse 
transcriptase) as the catalyst, the methods described herein may be modified 
to introduce mutations and crossovers into cDNA clones and to create 

25 molecular diversity directly from the mRNA level to achieve the goal of 

optimizing protein functions. This would greatly simplify the ETS (e^qpression- 
tagged strategy) for novel catalyst discovery. 

In addition to the above, the present invention is also useful to probe 
proteins from obligate intracellular pathogens or other systems where cells of 

30 interest cannot be propagated (38). 

Having thus described exemplaxy embodiments of the present inven- 
tion, it should be noted by those skilled in the art that the within disclosures 
are exemplary only and that various other alternatives, adaptations, and 
modifications may be made within the scope of the present invention. 

35 Accordingly, the present invention is not limited to the specific embodiments 

as illustrated herein, but is only limited by the following claims. 
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SEQUEHCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANTS: Frances H. Arnold 
Zhixin Shao 
Joseph A. Affholter 
Huimin Zhao 
Lorraine J. Giver 

(ii) TITLE OF INVENTION: Recombination of Polynucleotide 
Sequences Using Defined or Random Primer Sequences 

(iii) NUMBER OF SEQUENCES: 36 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Oppenheimer Wolff & Donnelly LLP 

(B) STREET: 2029 Century Park East, Suite 3800 

(C) CITY: Los Angeles 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) ZIP: 90067 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: Windows 

(D) SOFTWARE: Microsoft Word 6.0 

(vi) CURRENT APPLICATION DATA; 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(Vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/041,666 

(B) FILING DATE: March 25, 1997 

(C) APPLICATION NUMBER: 60/045,211 

(D) FILING DATE: April 30, 1997 

(E) APPLICATION NUMBER: 60/046,256 

(F) FILING DATE: May 12, 1997 

(G) APPLICATION NUMBER: 08/905,359 

(H) FILING DATE: August 4, 1997 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Oldenkamp, David J. 

(B) REGISTRATION NUMBER: 29,421 

(C) RSPEReiCE/DOCKET NUMBER: 330187-89 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (310) 788-5000 

(B) TELEFAX: (310) 277-1297 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 nucleotides 

(B) TYPE: nucleotide 



SUBSTITUTE SHEET (RULE 26) 



Printed from Mimosa 05/04/2000 



wo 98/42728 



49* 



PCTAJS98/05814 



(C) TOPOLOGY: linear 
(li) MOIiECULB TYPE: oligonucleotide 
<xi) SEQUBNCB DESCRIPTION: SEQ ID KO: 1: 
CCG AGC GTT GCA TAT CTG GAA G 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPB: oligonucleotide 
(xi) SBQITENCE DESCRIPTION: SEQ ID NO: 2: 
CGA CTC TAG AGO ATC CGA TTC 



(2) INFORMATION FOR SEQ ID NO; 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
GAG CAC ATC AOA TCT ATT AAC 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPB: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
OGA GTG GCT CAC AGT CGG TGG 



(2) INFORMATION FOR SEQ ID NO: 5: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 
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(ii) MOLKCOLE TYPE: oligonucleotide 
(Xi) SEQUENCE OESCRIPTION: SBQ ID NO: 5: 
TTG AAC TAT CGG CTG GGG COG 



(2) INFORMATION FOR SEC ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH! 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 6: 
TTA CTA OGG AAO CCG CTG OCA 



(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TCA GAG ATT ACG ATC GAA AAC 



(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
GGA TTG TAT CGT GTG AGA AAG 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
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(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
AAT GCC GQA AGC AGC CCC TTC 
(2) INFORMATION FOR SEQ ID NO: 10: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CAC GAC AGO AAG ATT TTG ACT 

(2) INFORMATION FOR SEQ ID NO: 11: 

(1) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 nucleotides 
<B) TYPE: nucleotide 
(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

ACT TAA TCT AGA GGO TAT TA 

(2) INFORMATION FOR SEQ ID NO: 12: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SBQUENCB DESCRIPTION; SBQ ID NO: 12: 
AGC CTC GCG GGA TCC CCG GG 

(2) INFORMATION FOR SBQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GGT AGA GCG AGT CTC GAG GGG GAG ATG C 
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(2) INFORMATION FOR SEO ID NO: 14: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(Xi) SEQUENCE DESCRIPTION: SBQ ID NO: 14: 
AGC CGG COT GAC GTG GGT CAG C 



(2) INFORMATION FOR SBQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CCG AGC GTT GCA TAT GTG GAA G 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CGA CTC TAG AGO ATC CGA TTC 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
CGG TAC GCA ICT AGC CGG TAC G 
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(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOIiOQY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CGG TAC C3AT TGC CGC CGG TAG G 



(2) INFORMATION FOR SEQ ID NO: 19: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 nucleotideo 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
CCG AGC GTT GCA TAT GTG GAA G 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
CGA CTC TAG AGG ATC COA TTC 



(2) INFORMATION FOR SBQ ID NOs 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GGC QGA GCT AGC TTC GTA 



(2) INFORMATION FOR SBQ ID NO: 22: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) IiENQTH: 18 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

lii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SBQ ID MO: 22: 
GAT GTG ATG GCT CCT GGC 



(2) INFORMATION FOR SBQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CAG AAC ACC GAT TGA GTT 



(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH] 16 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
AGT GCT TTC TAA ACG ATC 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH 4 amino acids 

(B) TYPE: peptide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
Ala Ala Pro Phe 



(2) INFORMATION FOR SEQ ID NO: 26: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 4050 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ix) MOLECULE TYPE: polynucleotide 

(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 26; 

GOGATCCTCT AAAGTCOACC TGCA0C6T6C CCAGCT6TTC GTGOTGGTGA TCQ0G6CCGC €0 

GCTGOCCQCC GTCCSOGGTCG CCGCOGCCOQ GCCOATCGAG TTCOTOGCCT TCC3T0GTGCC 120 

GCAGATCGCC CTQGGGCTCT GOGGOGGCAG CCQGCCGCCC CT6CTC6CCT CGOCQATOCT 180 

CGGCQCQCTQ CTGGTGGTCG GCGCOGACCT GGTCQCTCAO ATCOTOGTOG CGCOQAAGGA 240 

GCTGCCGOTC QGCCTGCTCA CCGCGATGAT CGGCACCCCG TACCTGCTCT GGCTCCTGCT 300 

TCGGCGATCA AGAAAGGTGA GCGGATOAAC GCCOGCCTGC GTCGCGAGGG CCTGCACCTC 360 

GCGTACGGGG ACCTrSACCGT GATCGAOQOC CTCGACGTC6 ACGTGCACGA COGGCTGGTC 420 

ACCACCATCA TOGGGCCCAA OGGGTGCGGC AAGTCGAOGC TGCTCAAG6C GCT0GGCC6G 480 

CTGCTGCGCC C6AC0GGCG0 GCAGGTGCTG CTG6ACGGCC GCCGCAT06A CCGQACCCCC 540 

ACCCGTGACQ TGGCCCGGGT GCTOQQCGTG CTGCCGCAGT OOCCCACCGC GCCCGAAOGG 600 

CTCACCGTCG CCGACCTGGT GATGC6GGGC CGGCACCCGC ACCAGACCTG GTTCCGOCAa 660 

TG0TC6CG0G ACGACGAOGA CCAGGTCQCC GAGGCGCTGC GCTGGACCGA CATGCTGGCG 720 

TACGCGOACC GCCCGGTGGA OGCCCTCTCC GGCGGTCAGC GCCAGCGCGC CTGGATCAGC 760 

ATGGCGCTGG CCCA0G6CAC C6ACCT6CTG CTGCTG6ACG AGCCGACCAC CTTCCTCGAC 840 

CTGGCCCACC AGATCGACGT GCTGQACCTG GTCOGCOGGC TGCACGCCGA GATGGGCOGG 900 

ACOOTGGTGA TGGTGCTGCA CGACCTGAGC CTGGCCQCCC GGTACGCOGA CCGGCTOATC 960 

OCGATGAAGG ACG6CCGGAT CGTGGCGA6C 6GG6G6CCGG ACGAGGTGCT CACCC0G6C6 1020 

CTGCTGQAGT CGGTCTTCGO GCTQCGCQCG ATGQTGGTOC CCGACCCGGC GACCOGCACC 1080 

CCGCTGOTGA TCOCCCTGCC GCGCACOOCC ACCTCGGTGC GOGCCTOAAA TCGATQAQCG 1140 

T6GTT0CTTC ATCGGCCT6C CGAGGGATQA QAOTATGTGG G0GGTAC3AGC OAGTCCGGAO 1200 

GGGGAGATGC GGCCOTGACG TCCTOOTACA TGCGCCTGAA AGCAGCAOOG ATC6CCTTG6 1260 

GTGTGATCGT GGOGACOQCA ACOGTGCCGT CACCCGCTTC CGGCAOGGAA CATGAOQGCG 1320 

GCIATGCGGC CCTGATCCGC CGGGCCTCGT ACGGCGTCCC 6CACATCACC GCGGACGACT 1380 

TC0GGA6CCT CGGTTTCG6C GT0GGGTA06 TOCAOGCOGA GGACAACATC TGCGTCATCG 1440 

CCGAGAGCGT GGTGACOGCC AACGGTGAGC GGTCGCGGTS OTTCGGTGCG ACCOGGCCGG 1500 
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ACGACGCCOA TOTGCGCAGC GAOCTCTTCC ACCGCAAGGC GATCGACOAC CGCGTCGCOG 1560 

AGC3GGCTCCT CGAAGOOCCC CGCGAOQGCO TGCGGGCGCC OTCCSGAOOAC GTCCGQOACC 1620 

AGATGCGCGQ CTTCGTCGCC GGCTACAACC ACTTCCTAOO CCQCACJOaGC GTGCACCGCC 1680 

TGACCOACCC QQCGTOCCGC GGCAAOGCCT OOGnOCGCCC GCTCTCCGAG ATOGATCTCT 1740 

GGC3GTACArr GTOOOACAGC AT6GTCCGC3G CCGGTTCCGG GGCGCTGCTC GAOGGCATCG 1800 

TCOCCOCQAC GCXJICCGACA GCCOCSOaOQC COOCGTCAGC CCOGGAGGC». CCCGAC3CSC0G 1860 

CC3GCGAT0GC CSGCCOCCCTC GACOaOACGA OOQCGGGCAT CGGCAGCAAC 600TACGGCC 1920 

TCGGCX5CGCA GGCCACCGTO ARCGGCAGCO GGATGGTGCT GGCCAACCCG CACTTCXXXn' 1980 

GGCAGGGCGC CGCACGCTTC TACCXSGATQC ACCTCAAGGT GCCOGGCCQC TAOGACGTCG 2040 

AG0GCGCX3GC GCTOGTCGGC GACCCQATCA TCGAGATCGG GCACAACOGC ACQQTOGCCT 2100 

GGAGCCACAC CGTCTCCACC GCCCGCOGGT TCaTOTGOCA CCGCCTQAGC CT06T0CCC0 2160 

OCGACCCCAC CTCCTATTAC GTCGACGGOC GGCCXXafiCG GATGCGC6CC CGCACGGTCA 2220 

CXSGTCCA6AC CGQCAGCOQC COOGTCAGCC GCAOCTTCCA CGACACCCGC TACG6CCCX3G .2280 

TCGCSCGTCGT GCOGGGCACC TTCGACTGQA CGCXXSGCCAC OGCGTACOCC ATCACOQAOG 2340 

TCAACGCGOO CAACAACCGC GCCTTCGACG GGTGGCTGCG OATGOGCCAG OCCAAGGAOS 2400 

TOCQQGCGCT CAAGGCGQTC CTCGACOOGC ACCAOTTCCT GCCCTGGGTC AACQTQATCG 2460 

CCGCCGACGC GCGGGGCGAG GCCCTCTACX3 GC6ATCATTC GGTCOTCCCC CGGGTCACCG 2520 

GCGCGCTCGC TOCCGCCTGC ATCCCGGCQC CGTTCCAGCC GCTCTACQCC TCCAQCGGCC 2580 

AGGCXSGTCCT OOACXKJTTCC COGTCQQACT GCGCQCTOGG CGCCGACCCC QAC0CCGCX3G 2640 

TCCCGGGCAT TCTCGGCC03 GCX3AflCCTGC CGGTGCGGTT CCXXX3ACQAC TACGTCACCA 2700 

ACTCCAACGA CAQTCACTGQ CTGQCCAGCC OGGCCGCCCC GCTOOAAGGC TTCCCGCGGA 2760 

TCCTCX5GCAA C6AAC3GCACC CC3GCGCAfiOC TCCGCACCCG GCTCGGGCTG GACCSW3ATCC 2820 

AGCAGOGCCT CGCCGGCACG GACGGTCTGC C0G0CAAQ66 CTTCACCACC GCCOGGCTCT 2880 

GGCAOGTCAT GTTOGGCAAC CXaOATGCACXl 0CGCCX3AACT CGTCCGCGAC GACCTGGTCG 2940 

CGCTCTQCCG CCGCXaGCCG ACCGOGACOO CCTOGAACXSG CGOQATCGTC GACCTCACCG 3000 

CGGCCTGCAC GGCGCTOTCC OGCTTCGATG AGCGTGCCGA CCTGGACAGC OGGGGCGCGC 3060 

ACCTGTTCAC CGAGTTCGCC CTC0CXX3GCG GAATCAGGTT CGCCGACACC TTCGAGGTGA 3120 

CXX3ATCC00T ACGCACCC3C0 CGCCOTCTGA ACACCACGGA TCCQCQGQTA CXX3ACGGCGC 3180 

TOGCCQACGC OGTOCAACGG CTCGCCGGCA TCCCCCTCGA CGCX3AAGCTG GGAGACATTC 3240 
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ACACCGAGAO GC0C6GCXSAA CGGG6CATCC CCATCCAC!GG TGOCCOCOOO GAftOCAGOaV 3300 

CCTTCAACGT GATCACCAAC COGCTCGTGC COtSGCGTGGG ATACOCGCAO GTa3TCCAC0 3360 

GAACATCQTT CGTOATOGCC GTCOAACTOG GCCXZGCACGG CCGGTOGGGA CGGCAOATCC 3420 

TCAOCTATGC 0CAGTCGACX3 AACCC3GAACT CACCCTGOTA CGCOGACXAfi ACOGTOCTCT 3480 

ACTCXSCGGAA GGGCTGGGAC ACCATCAAGT ACACCGAGGC GCAGATCGCX3 QCCGACCGGA 3540 

ACCTGCGCX3T CTACOGGQTG GCACAGCQGG GACX3CTQACC CACGTCACGC CGQCTCGGCC 3600 

CGTGCGGGGO CGCAGOGCGC CQATCGTCTC TGCATCGCCO GTCflGCCGGG GCCTGOGTCG 3660 

ACCGGCGGCO GCCGGTCX3AC GCC06CGTCC CGG06CAG0G ACTGGCTQAA OCGCCAGGCG 3720 

TCGGCGGCXX: GaGaCAGOTT GTTGAACATC ACGTAOGCCQ GGCOQCCGTC GAOGATOCOG 3780 

GOGAQGTGTO CCAGCTCXK5C ATCCGTGTAC ACATGCOGGG CGCOGGTOAT GCCGTGCAGC 3840 

CX3GThATAGG CCATCGGGGT CAGACTGOGG CGCAG6AACG GGTCX^GCGGC GTGGGTCAGG 3900 

TCCAGCTCCT GGCACAAGCC CTCGACCACC TCGTCCGGCC ACGGGCCGCG OGGCTCXXAC 3960 

AACAGCCGGA CACCGGCCGQ CCGOOGCGCT CGGGOGCAQA ACTCACGCAG 1CGC60QATG 4020 

GCGGGTTCGG TCGGCXXX3AA ACTCGCCGGG 4050 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4101 nucleotideB 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: polynucleotide 

(xi) SEQfUENCE DESCRIPTION: SEQ ID NO: 27: 

AAGCTTGCAT GCCTGCAGCG TGCCCAGCTG TTCOTGGTGG TGATCGCGGC OGOGCTGGCC 60 

GCCGTCG060 TOOCCGCCGC CGGGCOGATC GAGTTCGTCQ CCTTOGTCGT GCCGCAGATC 120 

GCCCraCGGC TCTQOGGCGG CAOCCGGCCG CCCCTGCTCO CCTCGGCGAT GCTCGGCGCO 180 

CTGCTGGTGG TCOGGGCOGA CCTGGTCGCT CAGATOGTGG TGGOGCCGAA GGAOCTGCCG 240 

GTCGGCCTGC TCACOGOGAT GATOGGCACC COOTACCTGC TCTGGCTCCT GCTCCGGCGA 300 

TCAAGAAAGG TQAGCGGATG AAOGCCCGCC TGCGTQGGQA GGGCCT6CAC CTOGCGTACG 360 

GGGACCTGAC CGTQATCGAC GGCCTCQAGG TCGACGT6CA CGACGGGCTG GTCACCACCA 420 

TCATCGGGCC CAACGGGTGC G6CAAGTCGA C6CT6CTCAA GGCGCT0G6C G6GCTGCTGC 480 

GCCCGAOOGG CG6GCAGGTG CTGCTGGACG 6C06CCGCAT CGACCGGACC CCCACCCGTG 540 

SUBSTITUTE SHEET (RULE 26) 



Printed from Mimosa 05/04/2000 



wo 98/42728 

- 58 - 

ACGTOQCCCG GGTCiCTCGGC GrTQCTGCCGC 
TCGCOOACCT GGTGATGCX3C QGCCGGCACC 
OCGACOACGA QGAC3CAGSTC OCCGAOOCOC 
ACXX3CCCX3GT GGACGCCCTC TCCC3GCQGTC 
TOGCCCAGGQ CACCQACCTG CTGCTGCTGG 
ACCAGATCGA CGTOCTGOAC CTOCJTCCGCC 
TGATGOTGCT GCACGACCTG AGCCTGC3CGG 
AGQAC06CCG GATCGTGGCG ttGCGCGGCGC 
AOTCGGTCTT CGGGCTGCGC GCGATGaTOG 
TGATCCCCCT OCCGOGCACC GCCACCTCGQ 
TTCATCGGCC TGCCGAGCGA TGAGAGTATG 
TGCOGCCGTG ACQTCCTCOT ACATOCQCCT 
CX3TGGCGA0C GCAGCCGTGC CGTCACXCTC 
GGOCXriGATC CX3C0GGGCCT OGTACGGCGT 
CCTCGGTTTC GGOGTCGOQT ACXJTGCAGGC 
CGTGGTAACQ GCCAACGGTG AOCGGTCGCG 
COATOTGCOC AGOGACCrCT TCCACCX3CAA 
CCTCGAAGGG CCCCGCGACG QOGTGCGGGC 
CGGCTTCGTC GCCGGCTACA ACCACTTCCT 
CCCGGGGT6C OGCGGCAAGG CCTGGGTGCG 
GTCGTGGGAC AGCATGGTCC GGGCCGGTTC 
GACGCCACCT ACA6CCGGCG GGCCCGCGTC 
CGCCGCCGCC CTCGACQGGA CGAGCX3CGGG 
OCAOGCCACC GTGAACGGCA GCGGGATGOT 
C6CCGAACGC TTCTACCGGA TGCACCTCAA 
GGCGCTGATC GGCGACCXX3A TCATCGGGAT 
CACCOTCTCC ACCGCCCX3CC GGTTCGTGTG 
CACCTCCTAT TACGTCGACG GCCG6CCCGA 
GACC3GGCAGC GGCCCGGTCA GCCGCACCTT 
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AGTCGCCCAC CX3CG0CCGAA GGGCTCACCO 600 

C!GCACCAGAC CTGGTTCCGG CAGTGOTOGC 660 

TOCOCTGOAC OGACATQCTQ GCGTACGCGG 720 

AGOGCCAOCG OGCCTGGATC AGCATGGCGC 780 

ACOAGCCGAC CACCTTCCTC GACCTGGCCC 840 

GGC1GCAG6C CGAOATGGGC CGQACCGTGG 900 

0CCGGTAG6C C3GACCGGCTG ATCGCGATGA 960 

C6GACGAG6T GCTCACCCCG GCGCTGCTGT 1020 

TQCCCGACCC GGCX3ACCGGC ACCCCQCTGG 1080 

TG0GG6CCTG AAATCGATGA GCGTGGTT6C 1140 

TCG6C3GGIAO AGCGAGTCTC GAGGGGGAGA 1200 

GAAAOCAGCA QCQATCGCCT TCGGTGTGAT 1260 

TTCCGGCAGG GAACATGACX3 GCGGCTATGC 1320 

CCCGCACATC ACCGCCXSACX5 ACTTCGGGAG 1380 

CGAGGACAAC ATCTGCXJTCA TCGCCGAGAG 1440 

GTGGTTCGGT GCGACCGGGC CGGACGACGC IS 00 

GGCGATCGAC GACCGCGTCG CCGAGCGGCT 1S60 

GCCOTCGGAC QACGTCCGGG ACCAQATGCG 1620 

ACGCCGCACC GGCGTGCACC GCCTGACCGA 1680 

CCCGCTCTCC GAGATCX3ATC TCTGGCGTAC 1740 

CGGGGCGCTG CTCQACX3GCA T0GTCGCXX3C 1800 

AGCCCCGGAG GCACCCGACG CCGCCGCGAT 1860 

CATCGGCAGC AACGCGTACG GCCTCGGCGC 1920 

GCTGGCCAAC CCGCACTTCC CGTGGCAGGG 1980 

GOTGCCCX3GC CX3CTACGACG TCGAGGGCGC 2040 

CGGGCACAAC CGCACGGTCO CCTGOAGCCA 2100 

GCACCGCCTG AGCCTCGTGC CCGOCGACCC 2160 
GCGGATGCGC GCCCGCACGQ TCACGGTCCA 2220 
CCACGACACC CGCTACGGCC CGGTGGCCGT 2280 
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GAT6CCGGGC ACCTTGQACT G6AC6CC60C CACCGCX3TAC GCCATCACOG ACOTCAACGC 2340 

G6QCAACAAC CGCGCCTTCX3 ACGGGTQGCT GCCSGATGOGC CAG6CCAAG0 ACQTCCGGGC 2400 

GCTCAAOGCQ OTCCTCXSACC OOCACCAGTT CCTGCCCTGO QTCAAGOTGA TCXSCCGCCOA 2460 

CG06CXXK3GC GAOGCCCTCT ACGGGOATCA TTGGGTCGTC CCCC6GGTGA C0GGCGC6CT 2520 

CXSCTGCCGCC TGCJATCCCGG GQCCGTTCCA GCCGCTCTAC 6CCTCCAGCG GCCftGGOGGT 2580 

CCTOQACXSOT TCCCXSGTCaO ACTG0GCX3CT CXSOCOCCOAC OCCOACGCCG CGGTCCC3G0Q 2640 

CATTCT06GC CCGQGGAGCC TGCGOGTGOG GTTCCGCQAC QACTACOTCA CCAACTGCAA 2700 

CGACAGTCAC TOOCTOGCCA GCOCOOCCGC CCCOCTaORA GGCTTCCOGC GOATCCTCGG 2760 

CAACGAACGC ACCCC6CX3CA GCCTG06CAC C C OGCTOGGG CTGGACGAOA TCCAGCAGCG 2820 

CCTCGCCGGC AC6GACX3GTC TGCXX3GGCAA GGGCTTCACC AOOQCCCGGC TCrGGCftGQT 2880 

CATSTTCGGC AACCX3QATGC ACQGOGCCQA ACTG6CC00C GAOQACCTOa TCXICGCTCTQ 2940 

CCGCCGCCAG CCQACCGCGA CCGCCIGGAA CGGOGCOATC 0T06ACCTCA CC6GGG0CTG 3000 

CAG6GCGCTG TCCCGCTTOG ATGAGOGTGC CX3ACCTGGAC AGCCGGOGCG OGCACCTGTT 3060 

CACCQAOTTC GCCCTCGCX3G GCOGAATC3M3 GTTCGCC6AC ACCTTCGAGO TOACCGATCC 3120 

GGTACGCACC CCG06CCGTC TGAACACCAC GQATC0GCX3G GTACX3GACGQ OGCTCGCOQA 3180 

CGCCX3TQCAA CGGCTCQCOG GCATCCXX^CT CGAGGCGAAQ CTGOGAGACA TCCACACCGA 3240 

CAOCCGGGGC GAACGGCX3CA TCCCCATCCA CX3GIGGCC0C GGGQAASCAS QCACCTTCAA 3300 

CGTQATCACC AAOCCGCTCG TQCOGGGCGT GGGATACCCG CAOGTCOTCC ACGGAACATC 3360 

GTTCGTGATG OCOQTCGAAC TCGGCCCOCA OGGCCCXSTCG GGAGGGCAQA TCCTCACCTA 3420 

T6CGCAGTCG ACQAACCCGA ACTCACCCTG GTAOGCOQAC CAGACCX3TGC TCTACTCGCG 3480 

GAA6GGCTG0 QACACCATCA AGT3VCACCQA GGCGCAOATC G0GGCG6ACC CGAACCTGC36 3540 

CGTCTACCQG GTGGCACAGC GOGGAGQCTG ACCCAOGTCA CGOOGGCTGG 6CC0GTGCX3G 3600 

GGGCGCAOOO CQCCGATCOT CTCTGCATCO CCGGTCAOCC OGGQCCTGGG TCQACOGGGQ 3660 
GCGGCCGGTC GA06CC0GCG TCCXZGQCGCA GOQACTQGCT GAAGGGCCAO GCGTOGGCGG . 3720 

CCCGGOOCAG GTTGTrOAAC ATCACGTACG CCQOQOCXSCC GTCGAGGATG C0GGCX3AGGT 37B0 

GIGCCAOCTC GGCATCCGTA TACACATGCC GGGCGOOGGT GATGCOGTGC AGCCGGTAAT 3840 

AGGCCATCGO CGTCAGACTQ OGGCGCAQQA A06GGTGGGC GQOGTGGGTC AGGTCCAGCT 3900 

CCTGGCACAA GCCCTOQACC ACCTCXITCC6 GCCAOOGGCC GCGCGGCTCC CACAACAOCX: 3960 

GGACACCGGC CX3GCCQGCGC GCT0GG0CX3C AGAACTCACO CAOTCGCGCG ATGGGGGGTT 4020 
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CXSGTCGOCOQ GAAACTCGCC GQQCACT6CA aOTOGAjCTCT AOAOGATCCC CGGQTACOQA 4080 

GCTCOAATTC GTAATCATGT C 4101 

(2) INFORHJ^TZON FOR SEQ ID NO : 28: 

(i) SEQDENCB CHARACTERISTICS: 

(A) LENGTH: 4093 nucleotldee 

(B) TYPE: nucleotide 

(C) TOPOLOGY: lineftr 

(ii) MOLECULE TYPE: polynucleotide 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

AAGCTTGCAT GCCTGCA6CQ TQCCCA6CT0 TTC6TG6TGG TGATCGCGGC C6CGCT66CC 60 

GCCGTCGCGG TCGCCGCCGC CGGGCCQATC QAGTTCGTCQ CCTTCGTrCGT GCCQCAflATC 120 

GCCCTOGGGC TCTQCC3C3CGa CAGCCGGCOG CCCCTGCTOG CCTCC3GaC3AT GCTCGGCGOG 180 

CTGCTGGTGO TCGOCGCCGA CCTGGTCGCT CAGATCGTGG TQGC6CC6AA G6AGCTGCCG 240 

GTOQGCCTGC TCACCQC6AT GATCGGCAiOC OOGTACCTGC TCTQGCTCCT 6CTTCGGCQA 300 

TCAAGAAAGG TGAOCGGATQ AACGOCCGCC TGOGTQGOQA GGGCCTQCAC CTCGCGTACG 360 

GOGACCTGAC OGTQATOGAC GGCCTCGA06 TCGAOGTGCA CQACGGMSCTG GTCACCACCA 420 

TCAT0G6GCC CAAGGaOTQC GGCAAGTCX3A 06CTGCTCAA GGOGCTCGGC CGGCTGCTGC 480 

GCGCGACCGO CQGGCA6GTG CTGCTGQACQ GCCGCG6CAT CQACC6QACC CCCACCCGTO 540 

ACGTOGCCCG GGTGCT06GC GTGCTGCCGC AGTCGCCCAC CGCGCCCGAA GGGCTCACOG 600 

TCGCOGACCT OGTGATGCGC GQCCGGCACC OQCACCAGAC CTGGTTCCQO CA0T6GTC0C 660 

GCGAC6ACQA 6QACCAGGTC GCCGAC006C TGCGCTGQAC CGACATGCTG GCGTACGCGG 720 

ACOGCCCGGT GGACGCCCTC TCCGGCGOTC AGCGCCAGOO CQCCTGGATC AGCATG6CGC 780 

TGGCCCAGQG CACGOACCTG CIGCTQCIGG ACQAGCCQAC CACCTTCCTC GACCTOGCCC 840 

ACCAGATCGA 06T6CTGGAC CTGGTCCGCC GGCTGCAiCGC CGAGATGGGC CGGACCSTGG 900 

TGATGGTGCr GCACGACCTQ AGCCTGGCC6 CCCQGTACGC CGACCGGCTG AT0GC6ATGA 960 

AGQACGGCCQ GATCQTGGG6 AGCGGGGC6C GGGAC6AGGT GCTCACCCCG GCGCTGCIGG 1020 

AGTCGOTCTT 0GGGCTQC6C GCGATGGT6G TGCCCGACCC GOCGACCGGC ACOCOQCTQQ 1080 

TGATCOCCCT GCCGGGCACC GCCACCTCGG TGCQGGCCTO AAATOJATGA GOGTGGTTGC 1140 

TTCAT0G6CC TSCCOAGOQA T6AGAGTAT6 TG6G0GGTA6 AfiOGAGTCCC GAGG6GGAGA 1200 

TGCCQCCGTO ACGTCCTCOT ACATGCGCCT GAAAGCAGCA GCGATCGCCT TCGGTOTGAT 1260 
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cxrrocscoAcc gcaaccxstgc cqtcacccgc 

GGCCCTGATC CGCCGG6CCT CXSTAOGGCXxT 

CCTCCSGTrrc ggcgtogggt acgtqcaggc 

CCriGGTGACO GCCAACGQTG AGOOQTCacO 
OQHTaTQCXSC AGCQACCTCT TCCACCGCAA 
CCTCGAAOGQ CCCCGC0AC6 GCGTGCGGGC 
OQGCTTCQTC GCOGGCTACA ACCACTTCCT 
CCCX3GCGTGC CGCGGCAAGG CCTGQG1X3CG 
AtTGTQGOAC AGCA.TGQTCC GGGCOGGTTC 
GACOCCACCG ACAGCCGCCX3 GGCCOGCGTC 
CGCCGCCGCC CTCGACG6GA OGAGCX3G6GG 
GCAOGCCACC GTQAACX3QCA GCGGGATOGT 
CGCCGCACGC TTCTACCGGA TGCACCTCAA 
GGCGCTGGTC GGCGACCGGA TCATCGAGAT 
CACCGTCTCC ACCGCOOGCC GGTTCGTGTG 
CACCTCCTAT TA0GTCX3ACG GC0Q0CCCX3A 
6ACCGGCAGC G6CC0GGTCA GCC90CACCTT 
GGTGCCGGGC AC C TTC G ACT GGACGCOC36C 
GG6CAACAAC CX3CGCCTTCG ACGGGTGGCT 
GCTCAAGGCG GTCCTCGACC GGCACCAOTT 
CGOGCGGGGC GAGGCCCTCT ACGGCGATCA 
CGCTQCCGCC TGCATOCCGG CGCOQTTCCA 
CCTOQACaaT TCOCGGTCGG ACTQCGCGCT 
CATTCT06GC CCGOCQAGCC TOCXGGTGCG 
CGACAOTCAC TGGCTOaCCA GCCCGGCCGC 
CAACGAACGC AOCCOGCOCA GCCTGOGCAC 
CCTCGCCGGC A0G6ACGGTC TQCCOGOCAA 
CAT0TTCG6C AACCGQATGC ACXX3CGCC0A 
COGCCOCCAO CCGACCGOGA CGGCCTG6AA 



FCTA)S98/0S814 

TTCCQGCAGG QAACATQACG QOGGCTATGC 1320 

CCCGCACATC ACCGCGGACO ACTTCX3GQAG 1380 

GGAGGACAAC ATCTGCGTCA TCGCOGAOAO 1440 

GTGGTTCGGT GCXSACCGOGC GGGACSOACGC 1500 

GGCGATOGAC GACCGGGTCG CCXSAOCGGCT 1560 

GCCGTCGGAC 6AC6TCCGGG ACCAQATGCG 1620 

AOGCCGCACC GGCGT6CACC GGCTGACXZGA 1680 

CCOGCTCTCC OAGATCQATC TCTQGCGTAC 1740 

CGGQGCGCTG CTOGACQGCA TCGTGGCCX3C 1800 

AOCCCCGGAG GCACCCGACO CCGCCQGGAT 1860 

CATCGGCA6C AACGCGTACG GCCTCGGCGC 1920 

GCIGGCCAAC CC6CACTTCC CGTG6CA0GG 1980 

GGTGCCOGGC CGCTACX3ACG TCX3AGGGCGC 2040 

CGGGCACAAC CGCA0G6TCG CCTGGAGCCA 2100 

GCACCGCCTG AGCCTCQTGC CaSQOGACCC 2160 

GCGGATGGQC GCCOGCACGO TCACQGTCCA 2220 

CCACGACACC CGCTACGGCC CGGTGGCOGT 2280 

CACXX3CGTAC GCCATCACCG ACGTCAACGC 2340 

GC6GATGG6C CAGGCCAAGG ACGTCCGG6C 2400 

CCTGCCCTGG GTCAACQTGA TGQCCGCCGA 2460 

TTCGGTGQTC CCGCQGQTQA CCQQCGGQCT 2520 

GCOQCTCTAC GCCTCCAGCO GCCAOGCXSGT 2580 

CGGCGCOGAC CCCX3A0QCG0 GGGTCCCGGG 2640 

GTTCCGCGAC GACTACGTCA CCAACTCCAA 2700 

CC06CTGGAA GGCTTOCGGC GGATCCTCG6 2760 

COQGCTCGGG CTQGACCAGA TCCA6CAQCG 2820 

GGGCTTCACC ACCXSCOCGGC TCTGGCAGGT 2880 

ACTGGTCOGC GACXSACCTGG TOQGGCTCTO 2940 

G6GCQCX3ATC GTOGAOCTCA CCGCGGCCTG 3000 
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CAOKJCGCTG TCCCGCTTCG ATOAGCXITaC COACCTOGAC AQCOOGGOCO CGCACCTCm 3060 

CACOQAGTTC GCCCTCGCGG OCXSOAATCAG OTTCOCCGAC ACCTT0C3AGG TGRCOGATCC 3U0 

GOTACQCACC CCGCOCCGTC TQAACACX31C OQATCCQOGG GTACXWAOGQ OQCTCGOOGA 3180 

CGtOTTGCAA CGGCTCGCCG QCATCCCCCT CGACC5CGAA0 CTOtSGROACA TTCACftCCCSA 3240 

CAGCCGOOGC QftACOOCOCA TCCCCATCCA CXSGTQGCCCX: GG3GRA0CAG GCACCTTCAA 3300 

C3GTGATCACC AACCCGCTCG TGCOGOGCGT aOQATACCCXJ CAQCSTOSTCC AOGOAACATC 3360 

OTTCGTGATO GOOCPTCGAAC TCGGOCCGCA CXMCCCGTOS GGAOOaCAOA TCCTCACCTA 3420 

TGCGCaOTCO AOGftACCCOA ACTCACCCTG GTACOCXOAC CaGACOOTGC TCTACTOGCG 3480 

G3U«3GGCTGG GACACCATCA AOTACACCGA GGCGCaOATC GCGOOCXJAOC CGAACCTGCG 3540 

COTCTACCGO GTGGCACAGC GGGOAOJCTa ACCCACOTCA COCCXaSCTCG GCCX2OTG0GG 3600 

GGOCGCASGO CC5CCGAT0GT CTCTGCATCX3 CCGGTCAGCC GGGGCCTGOG TCGAOCGGCG 3660 

GCGGCCGQTC GACGCCCXSCG TCOCGOOGCA OCOACTOOCT GAAGCGCCAQ GCXSTCQGCGG 3720 

CCCOGOGCAG GTTGTTGAAC ATCACGTACO COXSGCCGCC GTCGAGGATG CXX5GC3GAGGT 3780 

(jroCCAGCTC GQCATOCOTG TACACATOCC GOGCOCCGGT GATCCCGTQC AGCCGGTAAT 3840 

AGGCCATCGG OGTCAQACTG C5GGCX3CAGGA ACGGQTOGGC GOCGTOOGTC AGOTCCAQCT 3900 

CCTGGCACAA GCCCTTCGACC ACCTCXiTCCM OCCaCGGOCC GCGCGOCTCC CACAACAOCC 3960 

QGACACCGGC CGGCCGGCGC GCMOGGCGC AtJAACICACXS CAGTCGCGCG ATOGCGGOTT 4020 

CQCTCGGCCG GAAACTC6CC GGGCaCTGCA GGTCGACTCT AGAGGATCCC CCSGQGTACCG 4080 

AOCTCGAATT CGT ^^^^ 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3222 nucleotides 
<B) TYPE: nucleotide 
(C) TOPOLOGY: linear 

(11) NOLECOLB TYPE: polynucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

GCCGGCTGCA CGCOGAGATG GGCCGGACCG TGGT6ATGGT GCTGCACQAC CTGAGCCTGO 60 

CCGCCCGGTA CGCCGACCGQ CTGATOGCGA TGAAGGACGG COGGATCGTG GCGAGCQGGG 120 

CGCCGGACGA GGTCCTCACC CCGGOGCTGC TOGAOTCaGT CTTCGGGCTO CGCGCQATGa 180 
TOCTTGCCCGA CCCGGCGACC GGCACCCCGC TOGTGATCCC CCTQCCGCGC ACCGCCACCT 240 
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COGTOCGGOC CTQAAATCGft TGAGCGTGOT TGCTTCATCO GCCT8GCQA0 0GAT6AGAGT 300 

ATQTQGOCQQ TAOAOCGAGT CCCGAGGGGG AGAT6C06CC GTGA.CGTCCT CGTACATOCO 360 

CCTGAAAGCA GCAGOGATCO CCTTCXSaTOT QATCGTGGCQ ACOGCAACCO TQCCGTCACC 420 

CGCTTCCGGC AG6GAACAT0 A06606GCTA TG06GCCCTG ATCC6CCX3GG GCTOQTACGG 460 

CGTCCCOCAC ATCACCQCCG ACX3ACTTCGO CSAGCCTCGGT TTCGGCXTTCa GGTACGTGCA 540 

GGCOC3AG6AC AACATCT6C0 TCATCGOCGA OAOCOTOOTG ACOGCCAACG QTGAGCaOTC 600 

GCQOTGGTTC 0GTGC6ACCG Q0CXX3QACQA OGCCGATGTO CGCAOCGACC TCXTCCACCO 660 

CAAGGCGATC GAGGACGGCQ TCOCCX3A6GQ OCrCCTCQAA GGOCCZCOQCO AC06CGTGC0 720 

0QCX3CCGTCG QAOGACGTrCC GGGAOCAGAT GCGCGGCTTC GTGQCCQGCT ACAACCACTT 700 

CCTAC6CCGC ACCX30CGTGC ACOGOCTQAC CGACOCGGCG TGCXXSGGGCA AGGCCTSC30T ' 840 

GCQCCCQCTC TCCX3A0ATCG ATCTCTGGCQ TACATTGTOa OACAGCATQS TCOGGGCCGG 900 

TTCmSGGCXI CTC3CTCGA0G GCATOGTCOC OGOGAOXrA CC6ACAGCCXS COQGGCCOGC 960 

QTCAOCCCOO 0AG6CACC0G ACXaCOGCCOC OATCGCOGCC 6CCCTGC3ACX3 GGACGAOCGC 1020 

GGGCATCGGC AGCAACGC6T ACGGCCTCGO OOOGCAGOCC ACCGTOAAOQ QCAOCGGQAT 1060 

GGTGCTQGCC AACCC6CACT TCCCGTGOCA GG0CX3CCQCA OQCTTCTACC GGATQCACCT 1140 

CAAGGTGCCC GGCC6CTA0Q ACGTCX3A06G OGOGGOGCrG GTCGGOQACC CGATCATCGA 1200 

GATC666CAC AACCGCACG6 TCGCCTGOAO CCACACCQTC TCCACCGGCC GCCGGriTOQT 1260 

0TQ6CACCGC CTOAGCCTGQ TQCXXX3G0QA CCCCACCTCC TATTACGTCG AC60CCGGCC 1320 

C6AGCGGATG CQCGOCCGCA CGGTCACX3GT CCAGACGG6C AG0G6CCCGG TCA0CCGC31C 1380 

CTTCCACGAC ACCCGCTACO GCCCGGTGGC OGTGATGCCG GGCACCTTGO ACTGOACGCC 1440 

GGCCACCGCG TACGCXUUCA CCGACGTCAA CGCGGOCAAC AACC6CGCCT TC6ACGGGTQ 1500 

GCIGOGGATG G6CCAG6CCA AGGACGTCCG GGOGCTCAAG OCGGTCCTCG ACCGGCACCA 1560 

GTTCCT6CX:C TGGGTCAACG TGATOGCCGC CGACX3GGCGG GGCQAGGCTC TCTACGGGGA 1620 

TCATTCOOTC QTCCCCCGGQ TGACOGOCGC OCTCGCTGCC GCCTQCATCC CGOOGCOOTT 1680 

CCAGOCGCTC TACQCCTGCA GGGGCCAGGC GGTCCTaOAC GGTICCCGGT CGGACTQOGC 1740 

GCTCG6CQCC GACCCC6A06 CCG06GTCCC GGGCAnCTC GOCCCOGOGA GCCTGG030T 1800 

GCGGTTCOGC GAGQACTACG TCACCAACTC CAAOGACAGT CACCGGCIGG CCAGCCCOGC 1860 

CGCCCCGCTQ OAAOGCTTCC CGCGGATOCT GGGCAAGOAA OQCACCCOGC GCAQCCTGOG 1920 

CACCCGGCTC GGGCTGQACC AGATCCAOCA GOGCXTTCGCC GGCACGGACX3 6TCTGCC0GG 1980 
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CAAOQGCTTC ACCACCQCXT GGCTCTGOCA GGTCATGTTC GGCAACOGGA T6GA06GCX3C 2040 

GGAACTCGCC CQCGAGGACC TGGTCGC6CT CIGCCGCCGC CAGCOGACCG CGACCGCCTC 2100 

G2UIC0G0GCX} ATOOTOGAOC TCACCX3CGGC CTGCAOGGGG CTGTCCOGCT TCQATGAGCO 2160 

TGCGGACCT6 GACAGCCX9GG GGC3C6CACCT GTTCACCGAG TT06CCCT06 CXSGGOGOAAT 2220 

CAGGTTCOCC GACACCTTCQ AOaTQACCQA TCOGGTAC36C ACCCCGCQCC GTClt3AACAC 22B0 

CACGGATCOG CGG6TACGC3A GG6C6CTCGC COACOCCGTG CAAC6GCT00 CGG6CATCCC 2340 

CCTCGACGCG AAGCTGGGAQ ACATCCACAC CGACAGOCOC QQCQAACQGC GCATCOCCAT 2400 

CCACGGTOGC CX30GG(3GAAO CAG6CACCTT CAACGT6ATC ACCAACCC6C TCGTGCCGGG 2460 

06TGGGATAC C0GCAQ6TCQ TCCACGGAAC ATGGTTCQTXS ATGGCCGTCG AACTCGOCCC 2520 

GCACX3GCCCX3 TC606AC6GC AGATCCTCAC CTATQCQCAO TCGACOAACC CQAACTCACC 2580 

CTGGTACGCX: GACCAQACCG TOCTCTACTC GGOKSAAGGOC TQOQACAOCA TCAAGTACAC 2640 

CGAOGCGCAG ATOGCOGCXXS ACOCQAACCT GOQGGTCTAC CGGOTGGCAC AGGGGQGACG 2700 

CTQACCCAOG TCACXSCOGGC TOGGCCOGTG OGGGOGOQCA 6GQ06CCQAT OQTCTCTQCA 2760 

TGQCCGGrrCA OCCGGGGCCT GCGnCOACCG GCGGCGGCCa GTCQAOGCCC 6CGTCCCGGC 2820 

GCAOCGACTB GCTGAAGCOC CAGGCGTCGQ C6GCC08GGG CAGGTTGITG AACATCACGT 2880 

AOQCOQGGCC OCOGTOQAGG ATOCCGOCOA OGTGTGOCAG CT0G6CATCC GTATACACAT 2940 

GCdSGGCGCC GGTOATGCCG T6CAGCCG6T AATAGGCXAT CQGCX3TCAGA CTGGGGOOCA 3000 

GGAAOGGGTC GGCGGCGTGQ GTCAGGTCCA GCTCCTGGCA CAAGCCCTCO ACCACCTOQT 3060 

CCG6CCAC0G GC06C6GGGC TCCCACAACA GCGGGACACC G6CCGGC0GG G606CTGGGG 3120 

GGCAGAACrC AOGCAGTGGC GOQATGGCGG GTTCGGTC36G CCGQAAACTC GCGGG6CACT 3180 

QCAG6TG6AC TCTAQAG6AT CCCC66GIAC CGA6CT0GAA TT 3222 

(2) INFORMATION FOR SBQ ID NO: 30: 

(i) SBQDENCE CHARACTERISTICS: 

(A) LENGTH: 3193 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: polynucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

OATOGTOCTG CAOOACCTGA GCCTGGCCGC CCGGTACGCC GACOGGCTGA TCGCGATGAA 60 

G6ACGGCCGG ATCGTG6CGA 6CGGGGCGCC GGACGAGGTG CTCACCC06G CGCTGCTQTA 120 
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GTCGGTCTTC GQGCTQCQCG COATOOTOGT QCCCGACCCG GCGACCGGCA CCCCGCTOQT 180 

OATCCCCCTG CCOOQCACCQ CCACCTCGOT QOGGGCCTOA AATOOATOAQ COTOGTTOCT 240 

TCATCCSGCCT GCCGAGCXSAT GAOAGTATOT GG6CXJGTAC3A GCXSAQTCTOQ AaOGGOAQAT 300 

GCCGCOQTGA CGTCCTCOTA CATGCGCCTG AAAOCAGCAO COATCGCCTT CGCnXSTCATC 360 

OIGGCOACOG CAQCCOTGCC GTCACCCQCT TCOGGCAGGO AACATOAOGG CGGCTATOCG 420 

GCCCTGATCC GCOGGGCCTC GTACGGOGTC CCGCACATCA CCGCCOACGA CTTCGGQAGC 480 

CTOGGTTTCG GCGTCGGOTA CGTGCAOGCC GAGGAOUVCA TCTGCXSTCAT CQOCGAGAGC 540 

GTGGTAAOSG CCAACGGTGA GCOGTCGOQG TOGTTCQGTO CX5ACCG0GCC GGACGACGCC 600 

GATGTGCGCA GCGACCTCTT CCAOCGCAAQ GCGATCGAOG AGCGOGTCWC CXaAGCGGCTC 660 

CTCOAAGGOC CCCJGCX3A0GG CGTGCXSGGCQ CCOTCGOACG ACGTCCOGGA CCAGATGCJGC 720 

GGCTTC6TCX5 CCGGCTACAA CCACITCCTA CX3CCGCAC0G GCGTGCACCG CCTGACCGAC 780 

CCGOCGTGCC GCGGCAAGGC CTOOQTOOGC OCGCTCTCCG AGATCGATCT CTOGCOTACG 840 

TCOTGGQACA GCATGGTCCX} GQCXXOTTCC GGGGCGCTOC TCOACGGCAT COTCGCCGCG 900 

ACGCCACCTA CAOCCaCCGG GCOCGCGTCA GCCaXCAOG CACCCXSACGC COCOGCGATC 960 

GCCGCOGCCC TCGACQGGAC GAGCGOGGGC ATCGGCAGCA ACOCGTACGG CCT0GGCX5CQ 1020 

. CAflOCCACCG TOAACGGCAG CGOGATOGTO CTOGOCAACK CGCaCTTCCC GTOGCAGGGC 1080 

QCCOAACGCT TCTACCGQAT GCAOCTCAAG GTOCCCJGGCC GCTACGACGT CGAGGGOGCG Il40 

GCX5CTGAT0G GCQACCXXSAT CATCGGQATC GGGCACAACC X3CA0G0TCGC CTGQAGCCAC 1200 

ACCGTCTCCA CCGOCOGCCG GTTCGTGTGG CACOGCCTCA GCCTCGTGCC CXK3CX3ACCCC 1260 

ACCTCCTATT ACGTCOACGO CCGGCCCGAG OGGATGCOOG OCCGCACGGT CACG6TCCAG 1320 

ACOGGCAGCG GCCCGQTCAG CCGCACCTTC CACGACACCC GCTACGGCCC GGTOGCCOTG 1380 

ATGCCGGGCA CCTTCGACTQ GACGCCGGCX: ACCGOGTACG CCATCACJCOA CGTCAACGOG 1440 

GGCAACAACC GCGCCTTCGA OGGGTGGCTG CGGATOGGCC AGGCCAAGGA CGTCCOGGCG 1500 

CTCAAGGOGG TCCTCGACCG GCACCAGTTC CTOCCCTGQG TCAACGXGAT CGCCX3CC3GAC 1560 

GCOCGGGGCG AGGCCCTCEA CGGCGATCAT TCGGTCGTCC CCCX3GGTGAC COGOGCOCTC 1620 

GCTGCCGCCT GCATCCCGOC QCCGTTOCAG CXrOCTCTACG CCTCCAGCGG CCAGOCGGTC 1680 

CTQQACGGTT CCXGGTOGGA CTGCGCGCTC GGCGOCGACC CCGAOGCCXSC GGTCCCGGGC 1740 

ATTCTCGGCC COGCQAGCCT GCCGGTGCGG TTOOGOQAOS ACTACGTCAC CAACTCXMC 1800 

GACAGTCACT GGCTCGCCAG CCCGGCCGCC CCGCIGGAAO GCTTCCCGOG GATCCTCGGC 1860 
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AACGAACGCA CCCCGCGCAO CCTGG6CACC CG6CTC6GGC T6GACCAGAT CC3U3CA606C 1920 

CTCGCCOGCA CGC2A0G0TCT GCCOQGCAAG GOCTTCACCA CCGCCCQQCT CTGGCA6GTC 1980 

ATGTTCGGCA ACCOGATGCA CGGOCSCCCSAA CTC0CCCX3GQ AOGACCTOOT CX300CTCTGC 2040 

C6CCGCCAGC CGACCGCOAC CGCCICGAAC GGCGOGATCG TCGACCTCAC C6CGGCCTGC 2100 

ACGOCQCTOT CCCQCTTCGA TGAaOGTGCC OACCTOGACA GCCQ6Q0CQC GGACCTOTIC 2160 

ACCOAGTTCQ CCCTCGCGGa CGOAATCAGG TTCGCCGACA CCTTCGAGGT GACCX3ATC0G 2220 

GTACGCACCC CGOGCOGTCT GAACACCAOG GATCCX3CGG6 TACGGAGOGC 6CTCQC0GAC 2280 

OCOGTGCAAC GGCTOQCOSG CATCCCCCTC GACGGGAAGC T66GAGACAT CCACACOGAC 2340 

AGCCGCQGCQ AAOQQCGCAT CCCCATCCAC GOTGGCCGCO GGGAAGCAGG CACCTTCAAC 2400 

GTGATCACCA ACCCGCTCGT GCCG6GCGTG GGATAOCCGC A66TCGTCCA OGGAACATGG 2460 

TTCGTQATGG CCGTCGAACT CGGCCCGCAC G6CCCGTCGG GACGGCAGAT CCTCACXTTAT 2 520 

GCGCAGTCGA CGAACCOGAA CTCACCCTGO TAOGCCGACC AGACGGTGCT CTACTCG06G 2580 

AAGGGCTG6G ACACCATCAA GTACACOGAG GCGCAGATCG CGGCCGACCC GAACCTGC6C 2640 

GTCTACCX3GQ TGGCACAGCG GGGACGCT6A CCCACGTCAC GCCGGCTOGG CXXXnXSCGGQ 2700 

G6CGCAGG6C GC06ATOGTC TCTGCATCGC C6GTCAGC0G G66CCTGCGT CGACGGGGG6 2760 

CGGCCGGTCX3 ACGCCCGCGT CCCG6CGCAG CGACTGGCTG AAGCGCCAGG GGTCGGCGGC 2820 

CCGGGGCAGO TIQTTGAACA TCACQTACGC 06GGC0GCGG TCGAGQATGC CGGCQAGGTQ 2880 

TGCCAGCTCO GCATCCX5TAT ACACATGCCG GGCGCCGGTG ATGCCGTGCA GCCGGTAATA 2940 

GGCCATOGGC GTCAGACT6C GQCGCAGGAA CGGGTOGGCG GCGTGGGTCA GGTCCAGCTC 3000 

CTGGCACAAG CCCTOGACCA CCTCGTCCX3G CCA06GGCGG OGCGGCTCCC ACAACAGCCG 3060 

QACACCGGCC GQCGGGCGCG CTOGGGGGCA GAACTCAC6C AGT06CGCGA TCGGGGGTTC 3120 

GGTCGGCCGG AAACTCQCCG QGCACTGCAG GTGGACTCTA GAOGATCCCC GOQTACOGAG 3180 

CTCXJAATTCQ TTA 3193 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3193 nucleotides 

(B) TYPE: nucleotide 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: polynucleotide 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
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ATOGATGAGC OTGGTTGCTT CATCQGCCTG COGAOOQA.TO AGAOTATOTO QQCQQTAGAQ 60 

CXSAiGTCTCGA 6GGQGA6ATG CCGCOGTGAC GTCCTCGTAC AT6CGCCTGA AAGCASCA6C 120 

QATCGCCTTC QOTQTGATCQ TGGCQACOGC AQCOGTGCOG TCACCCGCTT GCGGCAOGGA IBO 

ACA1T3A0QGC GGCTATQOQG CCCTGATCC6 CCGGGCCTCO TA06G0QTCC CGCACATCAC 240 

GGCCGACGAC TTCaGGAQCC TCGOTTTCGG CGTCGGGTAC GT6CAGGCCG AGGACAACAT 300 

CTGCGTCATC 6CCX»USAGCG TGOTGAOGQC CAACGGTQAQ C6GTCX3GQC3T GGTTCQGTQC 360 

GACCGQGCCG GACQACX3CCG ATQTSCaCAG OGACCTCTTC CACCGCAAQO OQATCGACGA 420 

CC360GTOGCX: QAGCGGCTCC TCG2UU3GGCC CCX3CGACGGC GTGCOGGOGC CX3TCGGA0GA 480 

CGTCCGGGAC CA6ATGCGCG GCTTCCrTCGC CQGCTACAAC CACTTCCTAC GCCGCACCOG 540 

CGTGCACCGC CTGACGQACC CX36C6TQCCG GG6CAAQ6CC TOOaiQCQCC CC3CTCTCCQA 600 

GATCGATCTC TGG06TACGT CX3TGGGACAG CATGGTCCGG GGC6GTTC0G GGGCXSCTGCT 660 

CQACG6CATC 0TCGCCGC6A C6CCACCQAC AGCCX3CCQGG CCCGGGTCAG CCCOQGAQGC 720 

ACCCGACGCC GCC6CGATCG CCGCCX3CCCT CGACX»K3ACG AGCGCOGGCA TCOGCAGCAA 780 

CXKX3TACGGC CTCX30CGCGC AGGCCACXXST QAACGGCAGC GGGATGGTGC TGGCOVAOCC 840 

GCACTTCCXX3 TGGCAGGGCG CCisAACGCTT CTACC6GATG CACCTCAAGG TGCCC66CCQ 900 

CTACX3ACGTC GA0GGC300GG CQCTGATCGG CGACCCGATC ATCQAGATCG GGGACAACCO 960 

CACGGTCGCC TGGAGCCACA CCOTCTCCAC CGCCCGCGGG TTCGTGTQGC ACCGCCTGAG 1020 

CCTCGTGCCC GGGGACCCCA CCTCCTATTA CGTGQACGGC CXSGCCCGAGC GQATGCGGGC 1060 

CCGCACGGTC ACGGTCGAGA CCGGCAGCGG CCCGGTCAGC CX3CACCTTCC ACGACACCCG 1140 

CTACGGCCCG GTGGCCGTGG TGCCG60CAC CTTC6ACTGG ACGCCGGCCA CCG06TACGC 1200 

CATCACOGAC GTCAACQCGG OCAACAACOQ GGCXTTTCXSAC GGGTGGCTGC GGAT6GGCCA 1260 

GGCCAAGGAC GTCXX3GGCGC TCAAGG06GT CCTOaACOGO CACCAQTTCX: TGCCCTOGGT 1320 

GAACGTGATC GCCGCOGAOG CGCGGGGCGA GOCGCTCTAC GGOJATCATT CGGTCGTCCC 1380 

CCGGGTGACC GGOGCOCTCiO CTGCCGCCTG CATCCCGGCG COGTTCCAGC CGCTCIACGC 1440 

CTCCAGOGGC CAOGOGCrTCC TGGAOQOTTC CCX3GT0GGAC TGCQCX3CTG0 GOGCCQACCC 1500 

CGACX3CCGCG GTCCCGGGCA TTCTCGGCCC GGOQAOCCTG CCGGTGCGQT TCCGCGAOGA 1560 

CTA06TCACC AAGTCCAAC6 ACAQTCACTG GCTGGCCAOC CCGGCGGCCC COCTGQAAGG 1620 

CXTCCCGOGG ATCCTCGGCA ACGAACGCAC CCG6CGCAGC CTGCGCACCC GGCTCGGGCT 1680 

GGACCAGATC CAQCAGCGCC TCGCCGGCAC GGAGGGTCIG CCCGGCAAGG GCTTCACCAC 1740 
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CGCCOGQCTC TGGCaGQTCA TQTTCGOCAA CCGGATQCAC aOCGCCQAAC TOQTCCOCGA 1800 

CGACCTGC3TC GCGCTCTGCC GCOGOCAGCC GAC0GCX3ACC GCCTOGAACG G0GC3GATCXPr 1860 

OGACCTCACC GCG6CCT0CA CGGCGCTGTC CCOCTTCOAT QAGCOTGCOO ACCTGOACAG 1920 

CCGOQQCGOG CACCTQTTCA CCGAGTTCGC CCTOGOGGOC GGAATC3iaOT TCGC03ACAC 1980 

CTTCGAGGTO ACCGATCCGG TACOCACCCC GCX3CCGTCTG AAOICXIACXSQ ATCOQCGOQT 2040 

AGGGACGGC6 CTCGCOGACG CCGTGCAACO GCT0GC05GC ATCCOCCTCG ACGCGAAOCT 2100 

GGOAGACATC CACACCC3ACA GCCGCGOCGA ACGQCGCATC CCCATCCAC6 GTOaCCGCaG 2160 

GGAAGCAGGC ACCTTCAACO TQATCACCAA COOQCTCGTG CCGGGOOTGG GATACCCGCA 2220 

GGTCQTCCAC GGAACATCGT TCGTGATQGC CGTCGAACTC GGCCOQCACO GCC0STCGC3G 2280 

ACaaCAOATC CTCACCTATO CGCAGTOGAC GAACCCGAAC TCACCCTGOT ACGCCGACCA 2340 

GAGCQTOCTC TACTCOCGOA AGGGCTGGGA CACCATCAAO TACSLCCQAGQ CQCAQATG6C 2400 

6GCC6ACCCG AACCTG06CG TCIACOQGOT (SGCACAGCGG GGACQCTQAC CCACGTCACG 2460 

COQGCTCGQC CCCfTOCJGGGG GGGCAGGGOG CC5GAT0GTCT CTGCATOGCC GGTCAGCCGQ 2520 

QGCCTGCGTC OACCXSQCGGC GGCCGGTCCSA CGCCCX30C3TC COOGCGCAGC QACTGGCTOA 2580 

AGCQCCA66C GTCGGCGGCC COGGGCAGGT TGTTGAACAT CAC3GTA0GCC CSGGCCOCXXST 2640 

COAGOATQCC (3QCGA0GTGT GCCAGCTGG6 CATCCGTGTA CACATGCCGO GOGCCGGTGA 2700 

TQCCXSTGCAG CCGGTAATAG GCCATCGGCG TCAGACTGCG GGGCAGGAAC GQGrTCGGCGG 2760 

COTGGGTCAG GTCCAGCTCC TGGCACAAGC CCTOQACCAC CTCOTCCGGC CAOGGGCCGC 2820 

GCGGCTCCCA CAACAGCOQG ACACCGGCCG QCCGGCGCGC TOSGGOGCAG AACTCACGCA 2660 

OTOOCGCGAT GGCGGGTTCG GTCGGCC6GA AACTC0CCX3G 0CACT6CAG 2929 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SBQUENCB CHARACTERISTICS: 

(A) LENGTH: 782 amino acids 

(B) TYPE: amino acid 

(C) TOPOIiOGY: linear 

(ii) MOZiECULE TYPE: peptide 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Ket Arg Leu Lys Ala Ala Ala He Ala Phe Gly Val He Val Ala Thr 
5 10 15 

Ala Thr Val Pro Ser Pro Ala Ser Gly Arg Glu His Aap Gly Gly Tyr 
20 25 30 
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Ala Ala Leu He Arg Arg Ala Ser Tyr Gly Val Pro Hia He Thr Ala 
35 40 45 

Aep Asp Phe Gly Ser Leu Gly Phe Gly Val Gly Tyr Val Gin Ala Olu 
50 55 60 

Asp Asn He Cys Val He Ala Glu Ser Val Val thr Ala Asn Gly Qlu 
65 70 75 80 

Arg Ser Arg Trp Phe Gly Ala Thr Gly Pro Asp Aep Ala Asp Val Arg 
B5 90 95 

Ser Asp Leu Phe Hie Arg Lys Ala He Asp Asp Arg Val Ala Glu Arg 
100 105 110 

Leu Leu Glu Gly Pro Arg Asp Gly Val Arg Ala Pro Ser Aep Asp Val 
lis 120 125 

Arg Asp Gin Met Arg Gly Phe Val Ala Gly Tyr Asn His Phe Leu Arg 
130 135 140 

Arg Thr Gly Val His Arg Leu Thr Asp Pro Ala Cys Arg Gly Lys Ala 
145 150 155 160 

Trp Val Arg Pro Leu Ser Glu He Asp Leu Trp Arg Thr Leu Trp Asp 
165 170 175 

Ser Met Val Arg Ala Gly Ser Gly Ala Leu Leu Asp Gly He Val Ala 
180 IBS 190 

Ala Thr Pro Fro Thr Ala Ala Gly Pro Ala Ser Ala Pro Glu Ala Pro 
195 200 205 

Asp Ala Ala Ala He Ala Ala Ala Leu Asp Gly Thr Ser Ala Gly He 
210 215 220 

Gly Ser Asn Ala Tyr Gly Leu Gly Ala Gin Ala Thr Val Asn Gly Ser 
225 230 235 240 

Gly Met Val Leu Ala Asn Pro His Phe Pro Trp Oln Gly Ala Ala Arg 
245 250 255 

Phe Tyr Arg Met His Leu Lys Val Pro Gly Arg Tyr Asp Val Glu Gly 
260 265 270 

Ala Ala Leu Val Gly Asp Pro He He Olu He Gly His Asn Arg Thr 
275 280 285 

Val Ala Trp Ser His Thr Val Ser Thr Ala Arg Arg Phe Val Trp Bis 
290 295 300 

Arg Leu Ser Leu Val Pro Gly Asp Pro Thr Ser Tyr Tyr Val Asp Gly 
305 310 315 320 

Arg Pro Glu Arg Met Arg Ala Arg Thr Val Thr Val Gin Thr Gly Ser 
325 330 335 

Gly Pro Val Ser Arg Thr Phe His Asp Thr Arg Tyr Gly Pro Val Ala 
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70 - 



340 



345 



350 



Val Met Pro Gly Thr Phe Asp Trp Thr Pro Ala Thr Ala Tyr Ala lie 
355 360 365 

Thr Asp Val Asn Ala Gly Asn Asn Arg Ala Phe Asp Oly Trp Leu Arg 
370 375 380 

Met Gly Gin Ala hya Asp Val Arg Ala Leu Lye Ala Val Leu Asp Arg 
385 390 395 400 

His Gin Phe Leu Pro Trp Val Asn Val He Ala Ala Asp Ala Arg Gly 
405 410 415 

Glu Ala Leu Tyr Gly Asp His Ser Val Val Pro Arg Val Thr Gly Ala 
420 425 430 

Leu Ala Ala Ala Cys He Pro Ala Pro Phe Gin Pro Leu Tyr Ala Ser 

435 440 445 

Ser Gly Gin Ala Val Leu Asp Gly Ser Arg Ser Asp Cys Ala Leu Gly 
450 455 460 

Ala Asp Pro Asp Ala Ala Val Pro Gly He Leu Gly Pro Ala Ser Leu 
465 470 475 480 

Pro Val Arg Phe Arg Asp Asp Tyr Val Thr Asn Ser Asn Asp Ser His 
485 490 495 

Trp Leu Ala Ser Pro Ala Ala Pro Leu Glu Gly Phe Pro Arg He Leu 
500 505 510 

Gly Asn Glu Arg Thr Pro Arg Ser Leu Arg Thr Arg Leu Gly Leu Asp 
515 520 525 

Gin He Gin Gin Arg Leu Ala Gly Thr Asp Gly Leu Pro Gly Lys Gly 
530 535 540 



Phe Thr Thr Ala Arg Leu Trp Gin Val Met Phe Gly Asn Arg Met His 
545 550 555 560 

Gly Ala Glu Leu Ala Arg Asp Asp Leu Val Ala Leu Cys Arg Arg Gin 

565 570 575 

Pro Thr Ala Thr Ala Ser Asn Gly Ala He Val Asp Leu Thr Ala Ala 
580 585 590 

Cys Thr Ala Leu Ser Arg Phe Asp Glu Arg Ala Asp Leu Asp Ser Arg 

595 600 605 

Gly Ala His Leu Phe Thr Glu Phe Ala Leu Ala Gly Gly He Arg Phe 
610 615 620 

Ala Asp Thr Phe Glu Val Thr Asp Pro Val Arg Thr Pro Arg Arg Leu 
625 630 635 640 

Asn Thr Thr Asp Pro Arg Val Arg Thr Ala Leu Ala Asp Ala Val Gin 
645 650 655 
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Arg Leu Ala Gly lie Pro Leu Asp Ala Lys Leu Gly Asp He His Thr 
660 665 670 

Asp Ser Arg Gly Glu Arg Arg He Pro He His Gly Gly Arg Gly Glu 
675 6B0 685 

Ala Gly Thr Phe Asn Val He Thr Asn Pro Leu Val Pro Gly Val Gly 
690 695 700 

Tyr Pro Gin Val Val His Gly Thr Ser Phe Val Met Ala Val Glu Leu 
705 715 715 720 

Gly Pro His Gly Pro Ser Gly Arg Gin He Leu Thr Tyr Ala Gin Ser 
725 730 735 

Thr Asn Pro Asn Ser Pro Trp Tyr Ala Asp Gin Thr Val Leu Tyr Ser 
740 745 750 

Arg Lys Gly Trp Asp Thr He Lys Tyr Thr Glu Ala Gin He Ala Ala 
755 760 765 

Asp Pro Asn Leu Arg Val Tyr Arg Val Ala Gin Arg Gly Arg 
770 775 780 782 



(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 782 amino acids 

(B) TYPE: amino acid 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE; peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

Met Arg Leu Lys Ala Ala Ala He Ala Phe Gly Val He val Ala Thr 
5 10 15 

Ala Ala Val Pro Ser Pro Ala Ser Gly Arg Glu His Asp Gly Gly Tyr 
20 25 30 

Ala Ala Leu He Arg Arg Ala Ser Tyr Gly Val Pro His He Thr Ala 
35 40 45 

Asp Asp Phe Gly Ser Leu Gly Phe Gly Val Gly Tyr Val Gin Ala Glu 
50 55 60 

Asp Asn He Cys Val He Ala Glu Ser Val Val Thr Ala Asn Gly Glu 
65 70 75 80 

Arg Ser Arg Trp Phe Gly Ala Thr Gly Pro Asp Asp Ala Asp Val Arg 
65 90 95 

Ser Asp Leu Phe His Arg Lys Ala He Asp Asp Arg Val Ala Glu Arg 
100 X05 110 



SUBSTITUTE SHEET (RULE 26) 

Printed from Mimosa 05/04/2000 



wo 98/42728 



-72- 



PCTAJS98AI5814 



Leu Leu Glu Gly Pro Arg Asp Gly 
115 120 

Arg Asp Gin Met Arg Oly Phe Val 
130 135 

Arg Thr Gly Val His Arg Leu Thr 
14S 150 

Trp Val Arg Pro Leu Ser Glu He 
165 

Ser Met Val Arg Ala Gly Ser Gly 
180 

Ala Thr Pro Pro Thr Ala Ala Gly 
195 200 

Asp Ala Ala Ala He Ala Ala Ala 
210 215 

Gly Ser Asn Ala Tyr Gly Leu Gly 
225 230 

Gly Met Val Leu Ala Asn Pro His 
245 

Phe Tyr Arg Met His Leu Lys Val 
260 

Ala Ala Leu He Gly Asp Pro He 
275 260 

Val Ala Trp Ser His Thr Val Ser 
290 295 

Arg Leu Ser Leu Val Pro Gly Asp 
305 310 

Arg Pro Glu Arg Met Arg Ala Arg 
325 

Gly Pro Val Ser Arg Thr Phe His 
340 

Val Met Pro Gly Thr Phe Asp Trp 
355 360 

Thr Asp Val Asn Ala Gly Asn Asn 
370 375 

Met Gly Gin Ala Lys Asp Val Arg 
385 390 



Glu Ala Leu Tyr Gly Asp His Ser 



Val Arg Ala Pro Ser Asp Asp Val 
125 

Ala Gly Tyr Asn His Phe Leu Arg 
140 

Asp Pro Ala Cys Arg Oly Lys Ala 
155 160 

Asp Leu Trp Arg Thr Ser Trp Asp 
170 175 

Ala Leu Leu Asp Gly He Val Ala 
1B5 190 

Pro Ala Ser Ala Pro Glu Ala Pro 
205 

Leu Asp Gly Thr Ser Ala Gly He 
220 

Ala Gin Ala Thr Val Asn Gly Ser 
235 240 

Phe Pro Trp Gin Gly Ala Glu Arg 
250 255 

Pro Gly Arg Tyr Asp Val Glu Gly 
265 270 

He Gly He Gly His Asn Arg Thr 
285 

Thr Ala Arg Arg Phe Val Trp His 
300 

Pro Thr Ser Tyr Tyr Val Asp Gly 
315 320 

Thr Val Thr Val Gin Thr Gly Ser 
330 335 

Asp Thr Arg Tyr Gly Pro Val Ala 
345 350 

Thr Pro Ala Thr Ala Tyr Ala He 

365 

Arg Ala Phe Asp Gly Trp Leu Arg 
380 

Ala Leu Lys Ala Val Leu Asp Arg 
395 400 

He Ala Ala Asp Ala Arg Gly 
410 415 

Val Val Pro Arg Val Thr Gly Ala 



His Gin Phe Leu Pro Trp Val Asn Val 
405 
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420 



425 



430 



Leu Ala Ala Ala Cya He Pro Ala Pro Phe Gin Pro Leu Tyr Ala Ser 
435 440 445 

Ser Giy Gin Ala Val Leu Asp Gly Ser Arg Ser Asp Cys Ala Leu Gly 
450 455 460 

Ala Asp Pro Asp Ala Ala Val Pro Gly He Leu Gly Pro Ala Ser Leu 
465 470 475 4B0 

Pro Val Arg Phe Arg Asp Asp Tyr Val Tbr Asn Ser Asn Asp Ser His 
485 490 495 

Trp Leu Ala Ser Pro Ala Ala Pro Leu Olu Gly Phe Pro Arg He Leu 
500 505 510 

Gly Asn Glu Arg Thr Pro Arg Ser Leu Arg Thr Arg Leu Gly Leu Asp 
515 520 525 

Gin He Gin Gin Arg Leu Ala Gly Thr Asp Gly Leu Pro Gly Lys Gly 
530 S35 540 

Phe Thr Thr Ala Arg Leu Trp Gin Val Met Phe Gly Asn Arg Met His 
545 550 555 560 

Gly Ala Olu Leu Ala Arg Asp Asp Leu Val Ala Leu Cys Arg Arg Gin 
565 570 575 

Pro Thr Ala Thr Ala Ser Asn Gly Ala He Val Asp Leu Thr Ala Ala 
580 585 590 

Cys Thr Ala Leu Ser Arg Phe Asp Glu Arg Ala Asp Leu Asp Ser Arg 
595 600 605 

Gly Ala His Leu Phe Thr Glu Phe Ala Leu Ala Gly Gly He Arg Phe 
610 615 620 

Ala Asp Thr Phe Glu Val Thr Asp Pro Val Arg Thr Pro Arg Arg Leu 
625 630 635 640 

Asn Thr Thr Asp Pro Arg Val Arg Thr Ala Leu Ala Asp Ala Val Gin 
645 650 655 

Arg Leu Ala Gly He Pro Leu Asp Ala Lys Leu Gly Asp He His Thr 
660 665 670 

Asp Ser Arg Gly Glu Arg Arg He Pro He His Gly Gly Arg Gly Olu 
675 6B0 685 

Ala Gly Thr Phe Asn Val He Thr Asn Pro Leu Val Pro Gly Val Gly 
690 695 700 

Tyr Pro Gin Val Val His Gly Thr Ser Phe Val Met Ala Val Glu Leu 
705 715 715 720 

Gly Pro His Gly Pro Ser Gly Arg Gin He Leu Thr Tyr Ala Gin Ser 



725 



730 



735 
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Thr Asn Pro Asn Ser Pro Trp Tyr Ala Asp Gin Thr Val Leu Tyr Ser 
740 745 750 

Arg Lye Oly Trp Asp Thr He Lya Tyr Thr Olu Ala Gin He Ala Ala 
755 760 765 

Asp Pro Asn Leu Arg Val Tyr Arg Val Ala Gin Arg Gly Arg 
770 775 780 782 



(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SBQUENCE CHARACTERISTICS: 

(A) LENGTH: 782 amino acids 

(B) TYPE: amino acid 

(C) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SBQUENCE DESCRIPTION: SEQ ID NO: 34: 

Met Arg Leu Lya Ala Ala Ala He Ala Phe Gly Val He Val Ala Thr 
5 10 15 

Ala Thr Val Pro Ser Pro Ala Ser Gly Arg Glu Bis Asp Qly Oly Tyr 
20 25 30 

Ala Ala Leu He Arg Arg Ala Ser Tyr Gly Val Pro His He Thr Ala 
35 40 45 

Asp Asp Phe Gly Ser Leu Gly Phe Gly Val Gly Tyr Val Gin Ala Glu 
50 55 60 

Asp Asn He Cys val He Ala Olu Ser Val Val Thr Ala Asn Gly Glu 
65 70 75 80 

Arg Ser Arg Trp Phe Gly Ala Thr Gly Pro Asp Asp Ala Asp Val Arg 
85 90 95 

Ser Asp Leu Phe His Arg Lye Ala He Asp Asp Arg Val Ala Glu Arg 
100 105 110 

Leu Leu Glu Gly Pro Arg Asp Gly Val Arg Ala Pro Ser Asp Asp Val 
115 120 125 

Arg Asp Gin Met Arg Gly Phe Val Ala Gly Tyr Asn His Phe Leu Arg 
130 135 140 

Arg Thr Gly Val His Arg Leu Thr Asp Pro Ala Cys Arg Gly Lys Ala 
145 150 155 160 

Trp Val Arg Pro Leu Ser Glu He Asp Leu Trp Arg Thr Leu Trp Asp 
165 170 175 

Ser Met Val Arg Ala Gly Ser Gly Ala Leu Leu Asp Gly He Val Ala 
180 185 190 



SUBSTITUTE SHEET (RULE 26) 

Printed from Mimosa 05/04/2000 



wo 98/42728 



75- 



PCT/US98m5814 



Ala Thr Pro Pro Thr Ala Ala Oly Pro Ala Ser Ala Pro Glu Ala Pro 
195 200 205 

Asp Ala Ala Ala He Ala Ala Ala Leu Asp Gly Thr Ser Ala Gly lie 
210 215 220 

Gly ser Asn Ala Tyr Gly Leu Gly Ala Gin Ala Thr Val Asn Gly Ser 
225 230 235 240 

Gly Met Val Leu Ala Asii Pro His Phe Pro Tzp Gin Gly Ala Ala Arg 
245 250 255 

Phe Tyr Arg Met His Leu Lys Val Pro Gly Arg Tyr Asp Val Glu Gly 
260 265 270 

Ala Ala Leu Val Gly Aep Pro He He Glu He Gly Kis Asn Arg Thr 
275 280 285 

val Ala Trp Ser His Thr Val Ser Thr Ala Arg Arg Phe Val Trp His 
290 295 300 

Arg Leu Ser Leu Val Pro Gly Asp Pro Thr Ser Tyr Tyr Val Asp Gly 
305 310 315 320 

Arg Pro Glu Arg Met Arg Ala Arg Thr Val Thr Val Gin Thr Gly Ser 
325 330 335 

Gly Pro Val Ser Arg Thr Phe Kis Asp Thr Arg Tyr Gly Pro Val Ala 
340 345 350 

Val Val Pro Oly Thr Phe Asp Trp Thr Pro Ala Thr Ala Tyr Ala He 
355 360 365 

Thr Asp Val Asn Ala Gly Asn Asn Arg Ala Phe Asp Gly Trp Leu Arg 
370 375 380 

Met Gly Gin Ala Lys Asp Val Arg Ala Leu Lys Ala Val Leu Asp Arg 
385 390 395 400 

His Gin Phe Leu Pro Trp Val Asn Val He Ala Ala Asp Ala Arg Gly 
405 410 415 

Glu Ala Leu Tyr Gly Asp His Ser Val Val Pro Arg Val Thr Gly Ala 
420 425 430 

Leu Ala Ala Ala Cys He Pro Ala Pro Phe Gin Pro Leu Tyr Ala Ser 
435 440 445 

Ser Gly Oln Ala Val Leu Asp Gly Ser Arg Ser Asp Cys Ala Leu Gly 
450 455 460 

Ala Asp Pro Asp Ala Ala Val Pro Gly He Leu Qly Pro Ala Ser Leu 
465 470 475 480 

Pro Val Arg Phe Arg Asp Asp Tyr Val Thr Asn Ser Asn Asp Ser His 
485 490 495 

Trp Leu Ala Ser Pro Ala Ala Pro Leu Glu Gly Phe Pro Arg He Leu 
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500 505 510 

Gly Abii Glu Arg Thr Pro Arg Ser Leu Arg Thr Arg Leu Oly Leu Asp 
515 520 525 

Gin lie Gin Gin Arg Leu Ala Oly Thr Asp Gly Leu Pro Gly Lys Gly 
530 535 540 

Phe Thr Thr Ala Arg Leu Trp Gin Val Met Phe Gly Asn Arg Met His 
545 550 555 560 

Gly Ala Glu Leu Val Arg Aep Asp Leu Val Ala Leu Cys Arg Arg Gin 
565 570 575 

Pro Thr Ala Thr Ala Ser Asn Gly Ala lie Val Asp Leu Thr Ala Ala 
580 585 590 

Cys Thr Ala Leu Ser Arg Phe Asp Glu Arg Ala Asp Leu Asp Ser Arg 
595 600 605 

Gly Ala His Leu Phe Thr Glu Phe Ala Leu Ala Gly Gly He Arg Phe 
610 615 620 

Ala Asp Thr Phe Glu Val Thr Asp Pro Val Arg Thr Pro Arg Arg Leu 
625 630 635 640 

Asn Thr Thr Asp Pro Arg Val Arg Thr Ala Leu Ala Asp Ala Val Gin 
645 650 655 

Arg Leu Ala Gly He Pro Leu Asp Ala Lys I«eu Gly Asp He His Thr 
660 665 670 

Asp Ser Arg Gly Glu Arg Arg He Pro He His Gly Gly Arg Gly Glu 
675 680 685 

Ala Gly Thr Phe Asn Val He Thr Asn Pro Leu Val Pro Gly Val Gly 
690 695 700 

Tyr Pro Gin Val Val His Gly Thr Ser Phe Val Met Ala Val Glu Leu 
705 715 715 720 

Gly Pro His Gly Pro Ser Gly Arg Gin He Leu Thr Tyr Ala Gin Ser 
725 730 735 

Thr Asn Pro Asn Ser Pro Trp Tyr Ala Asp Gin Thr Val Leu Tyr Ser 
740 745 750 

Arg Lys Oly Trp Asp Thr He Lys Tyr Thr Glu Ala Gin He Ala Ala 
755 760 765 

Asp Pro Asn Leu Arg Val Tyr Arg Val Ala Gin Arg Gly Arg 
770 775 780 782 



(2) INFORMATIOH FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LEHGTH: 782 amino acids 
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(B) TYPE: amino acid 
<C) TOPOLOOYt linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Met Arg Leu Lys Ala Ala Ala He Ala Phe Oly Val He Val Ala Thr 
5 10 15 

Ala Thr Val Pro Ser Pro Ala Ser Gly Arg Glu His Aap Oly Oly Tyr 
20 25 30 

Ala Ala Leu He Arg Arg Ala Ser Tyr Gly Val Pro His He Thr Ala 
35 40 45 

Asp Asp Phe Oly Ser Leu Gly Phe Gly Val Gly Tyr Val Gin Ala Glu 
50 55 60 

Asp Asn He Cys Val He Ala Glu Ser Val Val Thr Ala Asn Oly Olu 
65 70 75 80 

Arg Ser Arg Trp Phe Gly Ala Thr Gly Pro Asp Asp Ala Asp Val Arg 
65 90 95 

Ser Asp Leu Phe His Arg Lys Ala He Asp Asp Arg Val Ala Glu Arg 
100 105 110 

Leu Leu Olu Gly Pro Arg Asp Oly Val Arg Ala Pro Ser Asp Asp Val 
115 120 125 

Arg Asp Gin Met Arg Gly Phe Val Ala Gly Tyr Asn His Phe Leu Arg 
130 135 140 

Arg Thr Gly Val His Arg Leu Thr Asp Pro Ala Cys Arg Gly Lys Ala 
145 150 155 160 

Trp Val Arg Pro Leu Ser Glu He Asp Leu Trp Arg Thr Leu Trp Asp 
165 170 175 

Ser Met val Arg Ala Gly Ser Gly Ala Leu Leu Asp Gly He Val Ala 
180 185 190 

Ala Thr Pro Pro Thr Ala Ala Gly Pro Ala Ser Ala Pro Glu Ala Pro 
195 200 205 

Asp Ala Ala Ala He Ala Ala Ala Leu Asp Oly Thr Ser Ala Gly He 
210 215 220 

Gly Ser Asn Ala Tyr Gly Leu Gly Ala Oln Ala Thr Val Asn Oly Ser 
225 230 235 240 

Gly Met Val Leu Ala Asn Pro His Phe Pro Trp Gin Gly Ala Ala Arg 
245 250 255 

Phe Tyr Arg Met His Leu Lys Val Pro Gly Arg Tyr Asp Val Glu Oly 
260 265 270 
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Ala Ala Leu Val Gly Asp Pro lie lie Glu lie Gly Hie Aan Arg Thr 
275 280 285 

Val Ala Trp ser His Thr Val Ser Thr Ala Arg Arg Phe Val Trp His 
290 295 300 

Arg Leu Ser Leu Val Pro Gly Aap Pro Thr Ser Tyr Tyr Val Asp Gly 
305 310 315 320 

Arg Pro Glu Arg Met Arg Ala Arg Thr Val Thr Val Gin Thr Gly Ser 
325 330 335 

Gly Pro Val Ser Arg Thr Phe His Aap Thr Arg Tyr Gly Pro Val Ala 
340 345 350 

Val Val Pro Gly Thr Phe Asp Trp Thr Pro Ala Thr Ala Tyr Ala lie 
355 360 365 

Thr Asp Val Aan Ala Gly Asn Aen Arg Ala Phe Asp Gly Trp Leu Arg 
370 375 380 

Met Gly Gin Ala Lys Asp Val Arg Ala Leu Lys Ala Val Leu Asp Arg 
385 390 395 400 

His Qln Phe Leu Pro Trp Val Asn Val lie Ala Ala Asp Ala Arg Gly 
405 410 415 

Glu Ala Leu Tyr Gly Asp His Ser Val Val Pro Arg Val Thr Gly Ala 
420 425 430 

Leu Ala Ala Ala Cys He Pro Ala Pro Phe Gin Pro Leu Tyr Ala Ser 
435 440 445 

Ser Gly Gin Ala Val Leu Asp Gly Ser Arg Ser Asp Cys Ala Leu Gly 
450 455 460 

Ala Asp Pro Asp Ala Ala Val Pro Gly He Leu Gly Pro Ala Ser Leu 
465 470 475 480 

Pro Val Arg Phe Arg Asp Asp Tyr Val Thr Asn Ser Asn Asp Ser His 
485 490 495 

Trp Leu Ala Ser Pro Ala Ala Pro Leu Glu Gly Phe Pro Arg He Leu 
500 505 510 

Gly Asn Glu Arg Thr Pro Arg Ser Leu Arg Thr Arg Leu Gly Leu Asp 
515 520 525 

Qln lie Gin Gin Arg Leu Ala Gly Thr Asp Gly Leu Pro Gly Lys Gly 
530 535 540 

Phe Thr Thr Ala Arg Leu Trp Gin Val Met Phe Gly Asn Arg Met His 
545 550 555 560 

Gly Ala Glu Leu Val Arg Asp Asp Leu Val Ala Leu Cys Arg Arg Gin 
565 570 575 

Pro Thr Ala Thr Ala Ser Asn Gly Ala He Val Asp Leu Thr Ala Ala 
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580 585 590 

Cys Thr Ala Leu Ser Arg Phe Asp Olu Arg Ala Asp Leu Asp Ser Arg 
595 600 60S 

Gly Ala His Leu Phe Thr Glu Phe Ala Leu Ala Gly Oly lie Arg Phe 
610 615 620 

Ala Asp Thr Phe Glu Val Thr Asp Pro Val Arg Thr Pro Arg Arg Leu 
625 630 635 640 

Asn Thr Thr Asp Pro Arg Val Arg Thr Ala Leu Ala Asp Ala Val Gin 
645 650 655 

Arg Leu Ala Oly lie Pro Leu Asp Ala Lys Leu Gly Asp lie His Thr 
660 665 670 

Asp Ser Arg Gly Glu Arg Arg He Pro He His Oly Gly Arg Oly Glu 
675 680 685 

Ala Gly Thr Phe Asn Val He Thr Asn Pro Leu Val Pro Gly Val Oly 
690 695 700 

Tyr Pro Gin Val Val His Gly Thr Ser Phe Val Met Ala Val Glu Leu 
705 715 715 720 

Oly Pro His Gly Pro Ser Oly Arg Gin He Leu Thr Tyr Ala Gin Ser 
725 730 735 

Thr Asn Pro Asn Ser Pro Trp Tyr Ala Asp Gin Thr Val Leu Tyr Ser 
740 745 750 

Arg Lys Gly Trp Asp Thr He Lys Tyr Thr Olu Ala Gin He Ala Ala 
755 760 765 

Asp Pro Asn Leu Arg Val Tyr Arg Val Ala Oln Arg Oly Arg 
770 775 780 782 



(2) INFORMATION FOR 8BQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 782 amino acids 

(B) TYPE: amino acid 

(C) TOPOLOGY: linear 

(ii) HOLBCOLE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Met Arg Leu Lys Ala Ala Ala He Ala Phe Oly Val He Val Ala Thr 
5 10 15 

Ala Ala Val Pro Ser Pro Ala Ser Oly Arg Glu His Asp Gly Gly Tyr 
20 25 30 

Ala Ala Leu He Arg Arg Ala Ser Tyr Gly Val Pro His He Thr Ala 
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35 



40 



45 



Asp ABp Phe Oly Ser Leu Gly Phe Gly Val Gly Tyr Val Gin Ala Olu 
50 55 60 

Asp Asn lie Cys Val He Ala Glu Ser Val Val Thr Ala Asn Gly Glu 
65 70 75 80 

Arg Ser Arg Trp Phe Gly Ala Thr Gly Pro Aep Asp Ala Asp Val Arg 
85 90 95 

Ser Asp Leu Phe His Arg Lys Ala He Asp Asp Arg Val Ala Glu Arg 
100 105 110 

Leu Leu Glu Gly Pro Arg Asp Gly Val Arg Ala Pro Ser Asp Asp Val 
lis 120 125 

Arg Asp Gin Net Arg Gly Phe Val Ala Gly Tyr Asn His Phe Leu Arg 
130 135 140 

Arg Thr Gly Val His Arg Leu Thr Asp Pro Ala Cys Arg Gly Lys Ala 
145 150 155 160 

Trp Val Arg Pro Leu Ser Glu He Asp Leu Trp Arg Thr Ser Trp Asp 
165 170 175 

Ser Met Val Arg Ala Gly Ser Gly Ala Leu Leu Asp Gly He Val Ala 
160 185 190 

Ala Thr Pro Pro Thr Ala Ala Gly Pro Ala Ser Ala Pro Glu Ala Pro 
195 200 205 

Asp Ala Ala Ala He Ala Ala Ala Leu Asp Gly Thr Ser Ala Gly He 
210 215 220 

Gly Ser Asn Ala Tyr Gly Leu Gly Ala Gin Ala Thr Val Asn Gly Ser 
225 230 235 240 

Gly Met Val Leu Ala Asn Pro His Phe Pro Trp Oln Gly Ala Glu Arg 
245 250 255 

Phe Tyr Arg Met His Leu Lys Val Pro Gly Arg Tyr Asp Val Glu Gly 
260 265 270 

Ala Ala Leu He Gly Asp Pro He He Gly He Gly His Asn Arg Thr 
275 280 285 

Val Ala Trp Ser His Thr Val Ser Thr Ala Arg Arg Phe Val Trp His 
290 295 300 

Arg Leu Ser Leu Val Pro Gly Asp Pro Thr Ser Tyr Tyr Val Asp Gly 
305 310 315 320 

Arg Pro Glu Arg Met Arg Ala Arg Thr Val Thr Val Gin Thr Gly Ser 

325 330 335 

Oly Pro Val Ser Arg Thr Phe His Asp Thr Arg Tyr Oly Pro Val Ala 



340 



345 



350 
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Val Met Pro Qly Thr Phe Aap Trp Thr Pro Ala Thr Ala Tyr Ala lie 
355 360 365 

Thr Aap Val Asn Ala Oly Aan Aan Arg Ala Phe Asp Oly Trp Leu Arg 
370 375 380 

Met Gly Gin Ala Lys Asp val Arg Ala Leu Lys Ala Val Leu Asp Arg 
385 390 395 400 

His Gin Phe Leu Pro Trp Val Asn Val He Ala Ala Asp Ala Arg Oly 
405 410 415 

Glu Ala Leu Tyr Gly Asp Hie Ser Val Val Pro Arg Val Thr Oly Ala 
420 425 430 

Leu Ala Ala Ala Cys He Pro Ala Pro Phe Gin Pro Leu Tyr Ala Ser 
435 440 445 

Ser Gly Qln Ala Val Leu Asp Gly Ser Arg Ser Asp Cys Ala Leu Gly 
450 455 460 

Ala Asp Pro Asp Ala Ala Val Pro Oly He Leu Qly Pro Ala Ser Leu 
465 470 475 480 

Pro Val Arg Phe Arg Asp Asp Tyr Val Thr Asn Ser Asn Asp Ser His 
4BS 490 495 

Trp Leu Ala Ser Pro Ala Ala Pro Leu Glu Gly Phe Pro Arg He Leu 
500 505 510 

Gly Aan Glu Arg Thr Pro Arg Ser Leu Arg Thr Arg Leu Gly Leu Asp 
515 520 525 

Gin He Gin Gin Arg Leu Ala Gly Thr Asp Gly Leu Pro Gly Lys Gly 
530 535 540 

Phe Thr Thr Ala Arg Leu Trp Gin Val Met Phe Gly Asn Arg Met His 
545 550 555 560 

Gly Ala Glu Leu Ala Arg Asp Asp Leu Val Ala Leu Cys Arg Arg Gin 
565 570 575 

Pro Thr Ala Thr Ala Ser Asn Qly Ala He Val Asp Leu Thr Ala Ala 
580 585 590 

Cys Thr Ala Leu Ser Arg Phe Asp Glu Arg Ala Aap Leu Asp Ser Arg 
595 600 605 

Oly Ala His Leu Phe Thr Glu Phe Ala Leu Ala Qly Gly He Arg Phe 
610 615 620 . 

Ala Asp Thr Phe Glu Val Thr Asp Pro Val Arg Thr Pro Arg Arg Leu 
825 630 635 640 



Asn Thr Thr Asp Pro Arg Val Arg Thr Ala Leu Ala Asp Ala Val Gin 
645 650 655 
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Arg Leu Ala Gly lie Pro Leu Asp Ala LyB Leu Gly Asp He His Thr 
£60 665 £70 

Asp Ser Arg Gly Glu Arg Arg He Pro He Hie Gly Gly Arg Gly Glu 
675 680 685 

Ala Gly Thr Phe Asn val He Thr Asn Pro Leu Val Pro Gly Val Gly 
690 695 700 

Tyr Pro Gin Val Val His Gly Thr Ser Phe Val Met Ala Val Glu Leu 
705 715 715 720 

Gly Pro Hie Gly Pro Ser Gly Arg Gin He Leu Thr Tyr Ala Gin Ser 
725 730 735 

Thr Asn Pro Asn Ser Pro Trp Tyr Ala Asp Gin Thr Val Leu Tyr Ser 
740 745 750 

Arg Lya Gly Trp Asp Thr He Lys Tyr Thr Glu Ala Gin He Ala Ala 
755 760 765 

Asp Pro Asn Leu Arg Val Tyr Arg Val Ala Gin Arg Gly Arg 
770 775 780 782 
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CLAIMS 

What is claimed is : 

1. A gene which encodes an engine having ECB deacylase activi^, 
said gene comprising a nucleotide sequence which is selected from the group 
consisting of SBQ. ID. NOS: 26, 27. 28. 29 and 30. 

2. An enzyme which exhibits ECB deacylase activity, said enzyme 
having an amino acid sequence which is selected from the group consisting of 
SEQ. ID. NOS: 32, 33, 34, 35 and 36. 

3. An enzyme which exhibits ECB deacylase activity, said en^me 
being made by the method comprising the steps of: 

a) inserting into a vector a double-stranded mutagenized 
polynucleotide having the SEQ. ID. NOS: 26, 27, 28, 29 or 30 to form an 

5 expression vector, said mutagenized polynucleotide encoding an enzyme; 

b) transfonning a host cell with said expression vector; and 

c) expressing the enzyme encoded by said mutagenized 
polynucleotide. 

4. A method for producing an enzyme comprising the steps of: 

a) inserting into a vector a double-stranded mutagenized 
polynucleotide having the SEQ. ID. NOS: 26, 27, 28, 29 or 30 to form an 
expression vector, said mutagenized polsmucleotide encoding an enzyme; 
5 b) transforming a host cell with said expression vector; and 

c) expressing the en^rtne encoded by said mutagenized 
polynucleotide. 
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